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ABSTRACT 


Jeme Bayesian deciaioen models which involve a finite Markov shein 
with uncertain transition probabilities are studied in this report. The 
principal theoretical features of these models are set forth and various 
questions of mmerical computetion are considered. 


It is aseumed, for the most part, that the family of prior distributions 
of the matrix of transition prebabilities 4s closed under sampling. This 
concest is defined arxi some properties of closed famijies of distrituticns 
are obtained. It 4s show that there ere an arbitrarily large member of 
such families, giving considerable generality to the entire study. A 
Giscounted adaptive control model for a HNarkov chain with alternative 
trensition probabilities and rewarde is then formulated as a set of 
functional equations. These equations are show to have «4 unique bountied 
solution and a method of successive approximations is considered which 
converges monmoeternically to this solution. 


The means, variances, end coveriances of the restenp trensition 
probabilities, the steadyestate probabilities, the tetal discounted reward 
veotor, and the process gain are than considered. It is shown that, 
under quite general cenditions, the mean n-step transition probability 
matrix appreaches the matrix of eteady=state probabilities ag n-»coo. 
These results are applied to discounted terminal escntroil ucdels in which 
& Karsxov chain with eltornative transition probabilities an! rewards 
is sampled, at & cost, until a terminal decision point 4s reached. At 
that time a terminal policy ie chosen and the system ic eperated 
indefiritely unier this peliey with no further sampling. It is show 
that a terminal deaisien point de roached with probability ono under an 
optinel sampling strategy. These modele are formulated as funetional 
eqnationa, which are shown to heave a unigue bounded solution, and 
gaccessive approximation techniques are investigated. 





We then turn te fixed sample sige analysis. The Whittle distritutien, 
ths matris beta distribution, and the betastihittle distribution are 
imtrodueed. It is assumed thet a finite Markov chain with uncertain 
trensition probabilities is observed for n consecutive transitions ani 
the prior-posterior and prepesterior analysis ie doveloped. 
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CHAPTER 1 
INTRODUCTION 





The basie eoneept of a Markov chain was intreduced by A. A. Marke 

























in 1907 and since that time the literature on the subject has grow 
venarkably. Fundamental investigations by Kolmogorov in the 1930%s 
“extersied the mathenatieal theory to ehalns with an infirdte mmber of 
“states: Dosblin end Doob made inrertant contributions duxine the periadc 
1935-1945. The present state of the theory of Markey chains 13 sunmarived 
‘by Chung [12]. 
By 1950 4t was well recognized thet the Ifarkev chain is a usofvl 
model Zor a miltitude of physieal processes and an increasing number ef 
appl ications of the mathenatical theory have been made to probleus in 
such fields as physics, chemistry, biology, ami operations researsh. In 
these applications it 1s generally assuned that the natedx of transition 
probabilities 4s known, although, since 1954, questions of rynethest 
testing end maximimlikelihooed estimation have been {erviict! gated. Thess 
La: _ renults are summarized by Mllingsley [10], whe gives extensive 
reness. 

; r me the pase tm decades Savage’s ae ae ke of the +g of 
ott, on subjective probability has or interest in lay ate 
decision theory. Contributions in this area have been made by many 
eesivchers, inaluding Von Neumann, ald, Hlaclwell.y and Girshiek, 22eciny 
tg the current work of Naiffa and Schlaifer [33], which, to a lerco dey. 





on Pes 
tresents a uniffed theery of statictioal decisions whieh is suitable fer 
appli cations. 
Recent research at the l'assachusetts Institute of Teshrolory [13, Ls, 

38} has been diracted toward the application of Bayesian decision theery 

to various medels based on Markev chains with uncertain transition 

probabilities. These efforts have demonstrated both the feasibility of 

such deai.sion models and the need for a more therough investigation of the 

widerlying mathematical theory. The present work attempts to establish 4 
; theoretical basis for some decision models which involve a finite liarkov 
chain with urmertein transition probabilities; particular attention is 
given te sequential decision models. While we have dealt, for the most 




















> with matters of existence and convergence, the question of rmmericai 
computation has not been neglected. There are, howsver, many problems 

cal conputation in this area whieh are yet te be solved. 

In 1953, L. S. Shapley [36], using a raris-theoretic forrmmlation, 

| = one of the earliest sequential decision redels in a Markev chain 

1 alternative transition probabilities, which were assumed to be kre. 
mg rilar cane formulations have been examined more resently by Zachrisser 
a Shor [37]. A more general olass of Markovian decision medels 


2 OsoE80S by lotard [23] and Jewell (24, 25]. Further references aro vile 
: | ai . 
by Jevell [25]. 


Sliver [30] has investigated various questions in a llarkov chain 







” 
with uncertain transition probabilities and rewards. In narticulas, he 
has treated the preblem of a natural conjugate distritmtion for the 








a 
dataepsnerating process of 4 Markov chain and has attempted to Mind tho 
axpected velus of cortain functions of the transition probabilities, sv 
as tho steadyestate probability vector. These results assumed a specific 
prior distribution for the trengition probabilities, a generalisation of 
the beta distribution which we shall call the matrix beta distribution. 
Many of Siiver’s results are generalised in the present work. 















Cozzoline [13] has examined a sequential decision model involving « 





state chain with uneertain transition probabilities. In a related 


— 


study, Cozzolino, Gonzalez-Zubieta, and Miller [14] have developed 





heuristic netheds for treating sequential deeisions in a Markev chain with 
uncertain transition probabilities. Their findings are based on lionte 
Carle studies. 

The results of the present study are obtained umder the assumption 
that the prior distribution fmotion of the matrix of transition 
probabilities belongs to a family of distributions wirleh is clesed uncer 
eonseautive sampling. This concept is formally defined in Chapter 2, 

: whore esene properties of such forilies ef distributions are derived. 

In particular, it 4s shown that there are an arbitrarily large number of 
Such fantiiies, thus providing considerable generality te the entire study. 
Additional renerality is obtained by stating all thoorens in terns of 
distribution fimctions MenanreStleitjes interrals, naling then 


applicable to both discrete and continuous prior distributions.. 

. In Chapter 3 we eensider a diseounted adantive control mdel in 

waieh alternative transition probabilities in a Markov chain are serpin! 
over an infinite tine period. The problem ef choosing a sequence of poliai»- 
which naxrinizes the expected discounted reward over an infinite perisd 
_ 


+) 
C?. Section 6, 3 











col fem 
is rowruleted in terms ef a set of Tunetional emations. It is shown 
that these cquations have a unique solution amd a mothed ef sucsessive 
avvrexinations which converges monotonically to this solution is considsrad. 
Certain funetions of tho transition probabilities, such as the neste» 


















transition probablilitios, the steadyestate probubllities, the disaocunt«i 
total reward, and the cain, are treated in Chanter 4, where wa obtain 
recursive equations for the means, variances, and covariances ef these 
quentitios. An irreortant rosult of this chapter is a proof that, wider 
quite general conditions, the mean nestep trensition nrobability matrix 





approaches the matrix of mean steadyestate probabilities as n—pee . 


7 These resits aro apniied in Chanter 5, whore discounted and 
cundiscounted torrinal control mdels are studied. In those models of a 
Varkevy chain with alternative transition vrobabilities tho decision= 
naker can sanpie various alternatives by payline a sampling cost. After 
a ms anount of infermation abeut the vrocess is reined in this 
namer it besones profitable for hin te cease sempling and to choose 3 
policy under witch the systen oporatos indofinitely. Those medels re 


formated as functienal equations and it 43 shown that, with probahdlé dy 


one, a terminal decision point is reached under an optinal sampling 
| stre tery. lie then show that there exists a unique solution to the 

fun etional equations and investigate a methed of successive anproxinaticn: 
_ Tho results of the first six chapters are obtained for any pricr 
‘distritution fmetion whieh belongs to a femily closed under songocutive 
arIDLANge In Chepters 6o8 we consider a specifie distriimition fer the 


tyancition probabilities which wo eall the matrix beta distribution. Tic 








‘distribution is dofined in Chacter 6 and its main properties are derived. 
ile alse introduce, in this chavter, the Whittle distribution and the 








oS 
hote-\Whittie distribution. These probability distritutions are utiliend 
in Chapter 7, where we de prier=vosterior and prevostoerior aralysis fer 
2 Markev chain whieh is observed under the consecutive sampling rule. Thu 
tranci tion count is identified as a sufficient statistic and is shown *« 











havo the hittle distribution, conditional on a fixed value of the 
transition probability matrix. The natural conjugate distribution for 
this dataerencorating procoss 1s the matrix beta and the uncenditional 
distrivution of tho transition count 1s the beta-thittle distribution. 

| In Chapter 8 we consider the results of Chaptors 2e6 in the ease of 
: a tao~state Markov chain when the nrior distribution of the transition 
proba ties 4s matrix beta. Explicit formes for the «¢ 
of various functions of the transition probabilities are given in terns 





of is parancters of tho natrix beta distribution. 
The rosults of this study are summarized in Chanter 9 and areas fer 














future research are discussed. 


The matrix with peneric elenent Pa is denoted ? = (o, 433 the row 
vector with penerie slenont p, is written 2 = ( Tyo sees P.de The matrix 
= bas the transpose of Pe 

A veotor £ © (xp evey Xy)y 16 @ point in the Nedimensional fuslidecn 


‘SPAL®, Rj and we shall use the customary norn, or dista 


fal » defined by 7 
7 7 2 ® 


| & Oy 
Inte Co x] (1.2.4) 


eo function, 





Sindlarly, the M x N matrix P is a point in Bang and has the norn 


eS «5p -< Ansan Ge ae — —_ 





i. 
lel a f : : 9 ‘ 7} : (4.2.23 
if Ser] jek “45 ° of « 


YNandem quantities are denoted by the tildes tine, %, D> 3, 3 ATO» 
respectively, a random matrix, a renmion vecter, and a random variable. 















Lot h(P) be a soalar fimetion of the M x N matrix fe. Assune that 
each row of 2 4s subjoct to the conetraint 


N 
et ve = i, jal, eoeg fi (1.2.3) 


if Prt BR) is a distribution finetion, the PlamrmeStileltjes interral of h 
is to be interpreted as an !i(leljefold iterated over the independent 
elenents of Ps 

a 





mi h(®) dr(B) i (41,24) 





fs » f nlogys oteg Pa, Nai? Pog ® eees Dey preg Py ye aeeg Peto’ @ 







rf £ pt a( Ps (hy 462] is a mtrixevalued function of Py the PiLenann=stielt jas 
in ind al of bo 4s to bo interpreted as the matrix of the integrals of ot 


J a(prerp) = f ge | (4.2.5) 


2.2.1 wikaey &. Chadn Lz th Alternat. 
shain with Creatine wo mean the following process. Let 
states a ‘the systen ean oceuny. Then the systen 1s in state i, the 
ive transition veoters, 


as. Hhen we refer te a Markov 











deei.sior alla een ehoese one “\ K, alternet! 





“ag 


a= (ks eee Pye where ae is the probabllity that the systen viel 3 
transition te state j, given that it 4s eurrently in state 4 and the tn 








—2e 
/ lc 
alternztive 4s used. The vectors p are storhistie vesterss that is, 
4 


9 > Os Kl, cocg i (1.2.62) 
ij” 44 Ly cect B 
a 1.2.60) 
2, ap kei ocog K ( 2ea00) 
pi late ¢.., a 







4th each transitdon veetor, ~ is associated a reward voctor, 

2 4" = (rege HPVs a where 7 is the reward earnsd when the systen 

Ba’: Kee & transition from state 1 to state 4 under the kth altornative 

(oo< ry <0 Why coon Kis Agdmly eooy Ne | 

The transition vectors can bo arranged in a K x tl matrix, P, whore 
N 















t al 8 sorrespomiing rowrd matrix be denoted by X oJ « Reserving witty 


a specific polly, 7 Sy will be denoted by PCS) or, if no confusion an 


a 
— _ 


resilt, by P. Tho corresponding reward matrix under volicy “J is 2(-) 





col }ee 
or R. The set of ail possible poliay vectors, LL, is denoted © ari ts 
a Pivite sot. 

The natrix ¢ can be reparded as the varanoter of a Narkov chain with 
aitermtives; uncertainty absut ¢ is oxpressed by regarding @ ag a 
randon matrix with a prier distribution function, H(Ci), whieh has the 
pareneter Y. In goneral, {/ is a point in a multidimensional muelidean 













Bpaes. Ths rance sot of g is the set of all K x N generalized stochas*ic 
natriecs, denoted By ) 


weaccre K x2 Ny P4520 z Pes mL Chet. cocy Ky 3 Lg Fel, n009 wi : 
(4.2.8) 

Ve renari that Kol is a closed and bounded, henes, cornnact, subset sf 

tho KNe<iimengional Suclidean spaco, eG The distribution funstion, 

WGI), 48 a function of the K(NeLl) independent elenents of &, 

Dyqs eces es for kel, ooog Ky and daly woop Ne H(CIY) has tho 

usual properties of a mitivariate distribution fimetiens; in particular, 


Fi et A ai, (4.229) 


From H1(@(4) can be obtained the marginal distributions of the 7 Ra 
alms Le stechastic matrices, Pe) e Tho marginal distribution fu 
3} 4s denoted Pelt) OF» when the denemience on ST. is elear, sinply 
(4), The range set of e is & yo the set of all t x N stochastic 


ss ie 








ra Ny Py 5 Os ea Pye m (i,j=1, eee9 x e (1.7.10) 














CHAPTER 2 
FAMILIES OF DISTRIBUTLOUS CLOSED 
UNDER SAMPLING 
























ifuch of the discussion presented in the following chapters is 
carried cut unier the assunption that ney)» the prior distribution 
of «, 4s a member ef a fanily of distributions closed under a cziven 
sampling rule. ile formally define this concept in the vresent chapter 
and dorive somo properties of such closed fariiics of distributions 
uirtieh will be used in the sequel. 
The notien of a fantily of distributiens closed wider sanpling is, 
of course, rot a newone. In Great Tritein, G. A. Darnard [5] in 1954 
and, nore recently, G. 3. ‘letherill [40], have anpliod this consept te 
sanpling inspection problens. In this country, R. Bellman [6] and Nelims: 
and | Kaiaba [8] have used the idea in comection with adaptive contre: 
processes. A vartienler clase ef distributions clessd under sampling, 





Latwiiutionn, forms tho basis of rocont 

300 areh by nad ff ard Sehlsifer [33] in statistical decision theexy. 
Tho properties sf closed fanilies of distributions wileh are derived 
in this chapter and thelr application to decision problems in a l'erkov 
shal 1tath alternatives aro original with the vresent sort. 


Cenaider a sequenco of transitions within a ifarkev chain with 
alternatives. A sappline xule is o set of specifleations witieh deter 
the following: 











aiQ= 
&e THe distribution ef the initial state of the ehain and 
tho ird tial policy umler «hich the process is opsrated. 






















be The transitions at whieh policy changes oecur. Those 
transitions may be determined probabllistically. 

e. The distribution of the na policy whon 4 policy chance 
eceurse This distribution is a probability mass fimction ovor 
the set of policies, ©, and allows for randornized sclection of 
policies. 

d. The transitions at witich tho stato of tho preecess is nade 
imam to the declsion=nakeor. Those transitions may be determined 
probabilistically and, whon they do ocour, on abservatlon of the 
process is said te havo take placs. Thus, an observation ef the 





process is a rarden variable whose range is the set of state 
indices, $1, eeeg nt e 
e. A rulo for termination of samling. 


We adopt the convention that, if a policy change or an observation 
oacu s at tho nth transition, it talas place inmnedilately after the nth 
elisttion occurred, 
There are tuo sanpling rules which are of partloular irmortance in 
suees sting chapters, consecutive sannling and veston sampling. 

amiive somline mie of size nis characterized as follows. 
initial state and initial poliey selected with orobabiid Gy 
ome. A total of n transitions are to eceur, tith n selected in advance. 
cack transition is observed. feliey changes, if thay scour, take plecc 
at predetermined transitions and, at each change, 4 
38 chosen with probability one. Thus, a sonsesutive csanpling rule of 





aige n consists of n consecutive ebservations of the states ef a Marker 








wide 
ehein with elternatives under a sequenes of policées which 4a solested 
in advance of senplinr. 
ing rule af sise n may be deserlbed as fellows. A 





positive interer, n, a sequense of n positive intogors, ips aeey v 9 

























ani a sequences of n pelicies, § Dis e0eg x f » are selected in advances 
ef senpling. We allow the rossithillty that sone or all of the G% are 
eqinl. A specific irditial state is chosen with vrobability one and a 
sequence of “4 transitions are aliowed to occur under the policy Te 

nt ) state of the Iiarkov chain is observed after tho v,th transition. 

Then yp transitions oecur under policy ae tho state being observad after 
tho v,th trangition, and so on. A total of n observations are taken in 


2 

‘this manner. Tho vestep sampling rule will be used in one of the termi 
bo. vtrol models of Chapter 5. 

Ve now nrocecd with the definition of a fartily of distributions 

closed under a garmmling rule. A collection, $4, of probabllity distribution 
functions 4s said to be o Laukly of distmibutlons imiexal by 1 if oll. 
neribers of the collection have the sane frmetional form and differ only 

in the values assigned to the paranetor 1. Tho sot of values witeh * 

‘dio assure is denoted &, termed the gcoLssobls 
chiissabie paranoter sot is assuried to be a cormostel subset of 4 
(rosa bly mitidinensional) Suclidean spacoe 

let a sampling mile bo specified and assmme that a sammie of n 





observationss %, = (Ayo ees XJ» has resulted wmder that sampling rule. 
Denote by 2 Gs," @) the likelineed of the eanple %, vader the given saraliny 
Fule civen that ¢ =f, Let the prler distribution function of Qj be 
(P14 )» a nenbor of St, a farily of distritutions index’ by 1. Tasr, 


Ef alld g \“") is the prier probability that Y lies in an infinitesinal 








wl Pre 
naighborhsed ef ¢ , tho testerlor distribution fimeticn of GY is 
HCP] Ys %,)o defined ty means of Bayes’ Theorsn: 


an®| Y, x) = L&lerengiy) (2.1.4) 
‘ Pate yan(P i+ ) 

TE (C| bs mled¢ for all +e ¥ and all samples x of norezero 

probability, then $+ is said to be clesed with respect to the senpiling 

mule which determines €(%|@ ). In this case the postorler cistribuisen 

is denoted 1(@| 4%)» whore 

1 ye 48), (204.02) 

ere Tis the mapping of EF into LT induced by the transforration 

(2.1.1) when 9+ 1s closed undor the given sampling rule. 

In the special cese where the sample consists of a single transiticn 

from state i to state j under the icth alternative in state 1, ¢ wilh 


















ere TH s4%). (26.3) 
: te a fixed policy = is in force, the superseript k = Oi my be 
genpressod in (2.1.3). 
wl 

In Seetion 2.3, faniites of distributions which are closad relative 
te the conseeutive sampling ard yestep sampling rules are diseussed in 
dotell. In order to carry out this dlsoussion, some properties of the 
Matrix beta distribution are required. Those properties aro summarised 
4n the next section. 
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WALL be show to be the naturel conjugate distribution for the likelihood 
funetion of the consecutive sanpling rule and, henee, is of intrinsic 















$15- 
imnoertance. lorsever, as tukli be seen in Seetion 2.3, mara of tho 
properties of arbitrary fardiies of distributions which are elesei relntine 
te the consecutive sampling rule or the vestep sampling rule are related 
to characteristics of the matrix beta distritution. For these reasons, 
the principal facts about this distribution are summarized in this sectilcn 
without proof. Complete derivations are given in Chapter 6. 

Tho K x N ramon caneralised stochastic natrix, g a8 iy ie said 





to have tho matrix beta distribution with parancter 7 = [mf j] ae ¢ 
hes ths joint density fimetion 
pkel’) i 
fie” (g 2 \n) - K(Q ) hi i x it cot grt , ¢ ob 
= 0. elsevuhere (26243) 
Tho normalising constant, k(7), is civen by 
Ks 
me Tit oe (2.242) 
a de Tot) 
jat 
te NY te 
M, > © Kl, cco Le (26203) 


meter Jy ig aK x matrix such that 


ie ; 
ne > O¢ KEL coon Ke (2o2/t) 
' os, Lef1y coes a 


It 4s shown in Chapter 6 that 


‘ | of H) 
“ (Qimpae #2. (2.2.5) 

Bi 

For lest, escp rE and ipojei, ese, Ny the means and variances of the 


clorente of © are given by the formas 


atti) " “i = Bi, (2.2.6) 








wi iten 


and 
ke ke ke 
. (M, - tL ) 
var in }= eS ek ee ae 
(i, *) (us 41) 
“i 
2 ag t= Ey) (2027) 
My +1 
The covariances of the elenents of ¢ aro 
P 4 or a: 
cov {[p 7j ae a" LS. ee 1 900 K. 
af” Pye (nit)? ars + 1) / cad ae n- 
6,621, eee9 i] 
a OQ. 3 xf k er a . Y (2.2.5) 











let %, = (Xp yo ooep x) be @ sample of n transitions observed under 
the eensesutive sampling i where %, is tho initial state, known in 
co of sarmling. Let f 3 j denote the mmber of transitions in gz, fron 
stste i to state j under the kth alternative in state 4 (kel, o00, Re 3 
 Apdat, coca N) and ache the Eransitien count, of the sanple as the 
Z x N reteiz Te [fy sie Then tho conditional probability, given that 
€ =f, of ae the sanple x, is 











were 
hom Wo my) 3 
isl jot lot . (20249) 
If the rulo by wich the sample sizo n was selected is neninformiive in 
the sense of Raiffa and Sehlaifor [33], then (2.2.9) 4s the likeliimed of 
‘the sarple &,° itis clear that F is a sufficient statistic fer this daitazq- 
ne erating proeecss and tt the natural conjugate distribution is the 
natrix beta distribution. 


Theors 2.2.4 Lot Y have the mtrix beta distribution with 
ar Mm" and suppose that a sample with transition cout F is ehsor’a. 
under the consocutive sampling rule with noninformative stepming. Then 








o1 So 
the posterior distribution of g is natrix beta with paranoter 
MO we NY + Pe (2.2.40) 
Braet. fy Bayes’ Thesrem the posterler distribution, DC 1% , F)s 
is proportional to the product of the kernel of the Mikelihnecd Amneticn 
and the kernel of the prior Aistribution, 


Je 
W #$N OK ™y : = f 
( = le 3 5 43 bo ' 
DCE IM" s ee A JT 1 (rg, . (2.2044) 


Tho richt cide of (2.2.11) 4s the kernel of a ratrix beta distri bution 









with parancter mM* + F. QB. 


Comblary 2oLage The family of matrix beta distributions is clescé 
taith respect to the consecutive sanvling rule. | 
fexok. Tho eorollary follows directly from Theora: 2.2.4, QeBele 


ens, Glased Under She Gonseeithve Ssnumking Fale 





In tho follewine chapters wo shell senfine our attention te medels 

baged en aithse the eomsecutive sampling rule or the v-step sammling 

milo. Sona proportios of fardlies of distributions vhich are elosai 

wiler a.ther of these rules are established in this seetion. Spedifealiy. 
4% 4s show thet there are an unlimited nunbor of distinct familics of 
atrivutions wich cre closed under the consseutive sempling rule, thus 
allowing tho decislonasker considerable latitude in selecting a prior 
dlatribution for @ e A lemme ef fumdéancenétal importance for the develounsn! 


ee 
__ 








of consceutive sempling models 19 next established. io thon tun to 
ferblies of disteibations elosed under the vestep sannling mile aml 4% 

is shown that the class of such faniites is identieal wits ths class cv 
fanilies consisting of probability mixtures of disteibutions from a fermi iy 
alesed. under sonsomitive sampling. It then fo@lows 2hnt any femily of 








ole 
distributions elesed wrlor vestep sampling is also elesed unier sonsocutive 
sampling. Finally, 1+ 19 proven that, for an arbitrary corior distributien 
on e » 1f n observations of the Y‘arkev chain are obtained under either 
ganpling rule, then, with probabliity one, the prebability mass of the 
postorior distritution tends to concentrate at QW » the true state of 


nature, as n-> o, 


2.304 Familias Clased Under Consseutive Sounding. 

In Seetion °.2 it was shown that the natural contucate distribution 
for the eensecutive sampling rule is the mtrix beta distributicn. 
uxtended naturel conjugate distributiens for this sampling rule can be 
constructed as fellows. Lot ¢(Plw) be a non-negative fbrel function” 
defined on a whieh is vesitive over some subset of SS 


Ky Keil” 
paremetor «© iz a point belonging te - , a subset of a Buclicean 


The 


space. Lot 2 © [my,] bo aK x N matrix with 
ri > GF ni, 9808 K. (2eSeh) 
J Apfel, voog 5 


We assune that ¢(@lw) is sufficiently wellebshaved that the integral 
N RN Kou ft 
fi at ie (of 443 g(e\w jae a 1/C(l»~) (76307) 
3 ist jel kal 
“KUN 
existe fer all @ ec Si.and all 7% whieh eatisfy (2.3.1). Let 


SP 
—w 


ein y OTL TY kt 3 ete), Pex 
niu) = claw) Tl pA Jy g(Plw), Ve 


K ois 
= 0. elsewhere (7.3.4) 
The function h(C\%,q) is a nonnegative Morel Amotion such that 
f n(P iow JAP wh, (27.744) 
a te 








" $80 Lodve [28], np. 106, ff., for a discussion of Borel fursticons. / 
funetien which is continusus at all but a fMnite minder of peints can te 
shan to be a Merel function. 











wl Fa 

and is, therefore, a probebllity density function. 

Carresponding to any fimotion e(P| a>) which satisfies the preceding 
requirements, wo define the oxtended natural conjugate famlly, +r, 
indexed lyr the ordered pair CMs @), 239 tho collection of probability cencvii: 
functions, h(@(TsG)» defined by equation (7.3.3). The following thosras 
shows that = oe is elosed wmder the consecutive sampling rule. 

Theorem 223.1 let Her be a family of probability density functicne. 
n(Q\Mso). as defined by equation (2.3.3). If the pricr distribution o: 
g is nl 9nrs aw?) He and if a sample Z, z (Xs ootk xe with 
transition count F = [fy als 43 ebsorved by consecutive cenling, then tho 
posterior distribution of © ais h(P| un" + Fp weg. THUS, ote 43 
closed undier the consecutive sanpling rule. 

Preof.. Tho posterler distribution of @, D CB\IM ses BI» 4.2 
provortional te the product of the kernel of the Ukelihoed fimeticon and 
the kernal of the prior donsity Nnetion, 


@ 
DLE IN?» cv» Be }) ie | re (ng) 43 AJ a(@les*), — (20565) 
im fal jel bol : / 
fron whieh the theore follows. Q.5.D. 


The uaranetor. 4) srovides the decislommaker with edditionsl 


lanai 


flexibility in encoding his prior ‘movleige about ©. It is % be 
noted, however, that & remains unchanged in the posterlor distribution and 
is, in that senso, a misenes paremeter. An example of an cxtended nabors1 
conjasete distribution is vresented in Section 6.4. 

The noxt result is of fondanental immpertance for the develouent oi 
tho sucesadine chapters. Sons additional notation is required. ler 
X= (Yy9 coos Yan? be a point in the Mielidean enace Con and let XY denste 


an intervel in Ey? 

















o18~ 
Tr {uly s ¥,< Be» Ch m1, coos xn) $ : (7.3.5) 


where a, < B, (4 = 1, cccy KN). Let Q be a partition of I inte a finite 
number of mutually exelusive and oxhaustivo intervals, Tyo osep Te For 
each I, we define the volune 


3 KN 
v(T, ) a TI (By e a, )s Vv & i» eoeg (2.3-7) 
fai 


and let v = on {rt} « Finally, let 4 5 denote the sovent that o 
transition oceurs from state 1 to state j under the kth alternative in 


state ie co 


4 


Jews 22322 Let GIT )edt, a family of distributions closed unde 
the consecutive sampling rule and let e(@ ) be any integrable function of 





- defined on 4 x nv Then the following identity is valid: 


= 


f Ps e(@ aE |¥) = m4C1) f ale a O| Ty CY ))s (7.3.5) 


re, od = Me xi > € NS 
. kei, eeey K, 
Ap jel, coog Ni 


where Bf (4) 4s the marginal expectation of i 5s 

froef. Lot I be an interval in oan which contains Dew For arn 
partition Q ef I, lot e. = Ce), denote an arbitrary point of 
ran Pes and let 4 (7) © Pe Ind, bt) when © has the 
distribution funstion 1(¢ iy). Then 


f* te | lin n ke ian 
fray @ a0Qir) = never E (5 gdy CE IAS) (7.9.93 


A ea | 
Using Bayes* Theorem, we have 


k 
A lt 304) = PCE ety Be nl Kyo J 
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rR Bt 0 Seg VIMY) east 
rey] 
But 
i k an(GlY) 
UK | Ger Be we Yj= i! 44 mT ra (2.3.32} 
v8 xp 2 


and, by the mean value theorem, there is a point @ .‘s (my) 3 of 
IA wey such that 
te > : 
PLA | CarAL, 2 a = (Py sry ° we 
Sines Ss 4s an arbitrary point a 3 NA yt we my set (ea), a (PE), 
in (2.309) 6 Then, notin that PX YI % mh» equation (?,3.10) 
yields 
% 
(res), ACO) = BEE) A (04) (2.3.45) 
and equation (2.3.9) becomes 


. ; = Lin ss Sy , 
“ Jog, eB raueyy) = EC) nee Ee P a cr Cry 
03 


= Hrd fac@raw elk jor. (2.3.20) 
Q.E.D. Free 


fo3e2 Families Gloss Under snsten Sambince 
Let us new ecnaider the likeliheed fimetion asseciated wlta a 
vestep sampling wulo of size n. This sampling rule is desoribed by the 
sequence of transition mumbers, § vy, oes vik » and by the sequence of 
policios, ere oovy Sz} o Let Ey. (yo coos %,) denote the resulting 
(v) 


observations, where x, 1s the nom initial state. Letting Py s (or) 


donete the (1,J)th elenent of the matrix (p())”, the conditional 








@ Qe 
probability, piven that CG es Y p Or observing the sanoly 5 is 


(vy) (va) (v) ~ oY ) -— Lagee 
Pity om (2%) x4 st, Zp ee * Py im, (so) eA nee 54 (a4). (7.3.45) 


If the rule by which the sample size n ws chosen in noninformtive, then 
(2.3.15) is the likelihoed of the sample 4 
let 3 bo a family of probability distributions indexed by *e« ft. 
For any fixed pesitive integer m let g = (a,9 ees a be @ stochastic 
lity mixture of distributions from + 4s defined to be 











4 1 | | 





whore i @\t, Yedy (ami, .o059 m). It 4a clear from the dofinitien thet 
ty melt be a Ti, g) is algo a probability pe de function for 












probability mixtures of distributions fron $+ as g renges over Loe 
the set of medimensional stechasitie vectors (for fixed m), and as m 
ranges over the positive integers. Since 1(@\{ ) is trivially a 
probability mixture, Wo 8”, 

The following theorsns establish that a family of distributions is 
closed wider yestep sampling if and only if it 4s the mixsd extension of 

& family closed under consecutive semmling and that such a mixed extension 
is ales elesed under the consecutive sampling rule. 





Thee 2.3.3 Let # bo a fanily of distributions closed under 
gonsocutive senpling and let +" be its mixed extension. Then $¢ ‘ 1g 





Blso closed under eonseeutivo sanpling. 
Preek. Let Xy, danote 4 sample of size n obtained by csonseeutive 
sampling. If the prior distritution of @ is T(@\{",y ---1 fsa )eds 








@o2h- 
where g9 ® (a4% coos O,°)y and if Ale) is the Mkeliheed Mnetien, 
then the posterler distribution is, using Deyes® Theoren ami equation 
RP65eL6) > 
LG, AE 0° (eI Ti #869 Tm  & ®) 
J eagle ant @U 4s see Pgs 8”) 


Wa 


d(C Pl {4s oes Th &"s a) = 





YY 
£ ot, 2G, rae | ¥*, ) a 





: a? fecig agi + 
ar ae 


nN 
sz °F an, an(@ \e. ) (223.173 
ink ee 


J LG, \€ aS (+, ) 


4=i, ceag I 
(2.3.48) 








a" J LG, 1g ag Mt") 


| oY", 4s defined ty cian (201.2). Since g* = ( Te) Ee Og) 
As ve stechastle vector, the posterior distribution of @ 1s 
te (@ | er cers ae & "Ye be, QebeDe 


Theores Zeta Lot HH be a family of protability distributdens 
indemod by Yee", A necessary and suffielont condition that H" be 
gloscd under the vesten sanpling rule is that oy” bo tha riixed eoctensiion 
of ia a family of distributions closed under consecutive sampling. 
Prof. First assume that mm. Let X, 4(vsSx) denote the obsorvation 
of el eseriitice tren 4 te j over a oh interval ef length » unde? 
tho roli.cy Ss = ( Tye eves ov, e The Ukeliheed finstion is 














oem 


my cr) . : : D D D (2.3229) 
Ve Oe Os CC 
yolk 
(x1) 


which is the sum of terms, each of whieh is the litelileed functian 


for a sample sequence ef length v observed umer the consecative sampling 
rule. Let n(@ lt *)e 34° be the prior distribution of &. The 


P=") 


ciffercntial form of the posterior distributicn has the kernel 











GH (GY » Xeclvy TIVE LE eve E an (@ 1") 
i Xx 5 > eee 2; ' @ee : { @ 
$ 4 5 Vp &) oa 4 ail 4 al Pia, <a 3 
u : (203020) 


if a” is the mixed extension of a family closed under «wnsscutive 
sampling, then Theore: 2.3.3 and equation (2.3.20) imply that 
wWey’s  s(vacd)e YH". Horeovers af 3° 4s not tho mixed extension 
of a fertily of distributions closed under consecutive sampling, then 
: a3 an (@\V") cannot be the kernel of a distribution in 

.* tor all v, 9 and T*. Then, for some v, T, end T°, the 
pa tee distribution is a probability of distributions, not 
211 of which are in $+", and, therefore, the posterlor distribution is 
not a nenber of a Thus, wo have established necessity arxi 








fficieney for the case n = 1, 
For n>4, the differential form of the nosterior distribution of © 
nas the kernel 





ar(@|t", a) < i ph’s? (epar@ +") 


és Jo01.9 94 
n (v4) ms ph. ay 
= Ilr, Pa rgsey pet (SI To Ken (ye Sa))s (2.63.22: 


ami the theorem follows by induction. Q.5.D. 








02> 
Goxollaxy 2.3.5 If #° 4s a family of dietributions closed wrier 
the yestep sampling rule, then 34." 4s also elosed wiler the sonsecutive 
sampling rulo. 
Rraaf. Tho corollary follows immediately from Theorens 2.3.3 and 
Zodelte QekeDe 


20303 Laxca Sanmle Ihmxyve Let 1G|Y) be an arbitrary prier 
distribution fumetion of @ « We now show that, if a sample of sige n 
is ebserved under eithsr the consecutive or the westep sereling rule, 
the probaidlity mass of the pesterlor distribution tends, as nA , 
to concentrate at G , the true state of nature, with probability one, 
Tato statenent is nate precise in Theorens 2.3.8 end 2.3.9. Hot only 
are these results of interest on their ow merits, but an important 
eppliention of Theorems 2.3.8 atx 2.309 1811 bo mde in Chapter 5, where 
the question of tarmination of sampling is eoneiderod fer termine] 
esntrol mdols. 

Censider a semis of sise n obtained undor the vwestep sampling 
eule. Yor a fixed state 4, a fixed polley I: and a fixed transi.tion 
interva®. v, wo shall say e trial occurs whenever tho system makes a 
tranultion fron stete 4 to any other state over a transition interval 
of ley th v wiler the volicy TS. For a Maed state J, let there be 
abated vith the mth trial the random variable XC) which tekes the 
value 1 if tho system is next observed in state j aml the value sere 
othemias. A semple of aize n thus penerate) a sequences 
1%, (4) 0 Pat xo(I of independent, Ldentloallly dictrlvuted ranion 
varinhles whieh, if @ 1s the true state of nature, have the probatility 











oP) je 
Ametion 


rer, (j) #1) = ays) “> aloe (2,3.220) 
| 


PR (3) 2 O] = 1- at ; Tes: ela (2630825) 
Lo ccogll 


and expected value 
B® (3) = a3 (Zs Oly 25000 (2.3423) 


po vagil 














The following Lema is an immediate consequence of the strong law 
of large nunibors. 


IaH, £2326 Let (X5(3)» dae X C9) be an observation of sizem af 
the sequense of trials defined above, for fixcd states 4 and j, a fixed 
poltey SS, and a fixed transition interval v. If, as mam , atate 4 is 
enterec en infinite nuuber of times ami the poligy T and transition 
interval » ere used infinitely often when in state i, wo have, whth 


Se Oo , 9) - \ tr 
m>°or 6m % X43) = ds 3S) Fils seeglt (2.3.83; 


where Q 4g tho truo state of nature, 


Wo renarc that, if vai and ek, Lam 2.3.6 amnlies to the 
enseeutive sampling rule and oquation (2.3.24) becomes 

, ts 
din = ty Js ‘ ~~ Om Om F 
myo? m es XD & “ly 4? Jel y coogi (2635025) 
the lirdt holding with probability one. 
A rpenoralised stochastle rintrix, g od tp, 3 > 4s said te be O94 GE Ue 


4? all of its elements are positive, wiieh implies that 





0< Ph x. Le kxi, eeo8 Ky (263070) 
4, jal, ©9088 i 








JER. Cade? Lot 7 bs a sample of size n obtained under tha wsiep 
sempling rule. Asmme that, as nso, a fixed state 1 is observed 
infinitely often and that, when in state 4, the policy 9 end transition 
interval v are used infinitely often. Then, if the true state of rmture, 
Q,isa positive matrix, every state J (jai, 005 N) 4s, with probability 
ene, obssrved infinitely often. 

Proof. For fixed states 1 and j, ths policy —, and the transition 
al. vp Let {x,(4)3 bo the sesueneco of trials renerated by the sanple 
| ‘as ag defined above. Tho hypotheses of the lemma imply that m= as 
n=» ce and wo have, by Lamm 2.3.6, 


lin J “ | 
mee ES) Qe y Sey oneal 2.9527) 



















with probability one, since aye) > 0 for 4thy seogly (203027) implies 
‘that, with probability one, (J) = 1 infinitely often for each state J. 


This lem can nrobably be proved under the weaker assumption thet 
les ) ds errpedie, but it is sufficient for our purposes to asgume that 


7 = = =~ —<_ 


the true state of nature 4s a positive matrix. It will be shew in 
chapter 4 that, for all orior distributions ef g whieh satlefy a mild 
contimity condition, the set of non-positive natriees is a sot of 

He again remark that, by taldng v 21, Ler 2.3.7 applies to sannies 
obtained uder the consecutive sampling rule as well as under the »stop 
sexplin: rule. 

Ist ¢ be an arbitrary positive nuriber and define ¢ te be the K x 0 
matrix each element of whieh is c. For any K x N matrices ge ani Gl us 


c aata 
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sey that 
If - A<c (243.28) 
if 
k k 
= e $n), 2e6@ K (2.36293 
bt ay 45 JBL, coog tl : 
Clearly, 1f (2.3.23) holds, then 
Va 
N MY 
2 
ig - re (2, fal = (Py > 45) ” 
QE JKN, (2.3.30) 










one the nom, lg ~ Ql 9 oan be made artitrarliy asmall by an aporepri.ato 
choles of ec. Let (P|) bo an artitrary prior distribution fimction of 
e and assume that a seville, &,9 of size nis observed. Denote by 

M@\ f > %,) tho posterior distribution of YF and, for fixal C, let 





pC(€-Q)<el= faueltay —= (2332) 
i. GcE 


dengte the posterior probability of the set 


ee{@ if -G<gt— An - (203-32) 
—_— wo say that the posterior probability mass tends, as n>co,y t 

q we entrate at Ql » the true state of nature, tith probability ene, we 

ry n that, for ary <>, 


lin pC 1G- QI <c]=i, (203033) 


‘the Mrit holding with probability one. 


Theor 2.3.8 Lot (@\4) be an arbitrary prior distribution function 
e &, let % be a sample of sise n obtainsd frem a Markey chain with 
tecatives umier the consecutive sampling rule. Assume that the samplin: 
strategy 4s such that, as n-»«0, if state 4 is entered infinitely oftsn, 














ore 
every alternative in state 1 4s sempled infinitely eften (202, «oe, N- 


if Q> the true state of natura, is a positive matrix, them, for any <> 0, 


Ma pit |€-e@j <c)=1, (2.369%) 
the linit holding with probab§ity one, provided H(@\+) assigne podltive 
probabllity to the sot E defined by equation (2.3.32). 

Bmof. Let F(n) = (fs, (n)] be the transition count of the semple 
x" The nda ee eke? = Pe of ge is HE) tm). where 














| ii Be Ec wer 
dy, 18h joi ot 43 = 
Letting | 
Ma) = (a, (nd) 2 Ces (a) + 4) (263.563 


aml maltiplying the mmeretor and denominator of (2.3.35) by the 
normal ising constant (My (n)) defined by equation (2.2.2), te have 


5" \4ytad) agit) 





AE tox? ct SRE ME (2.3.57) 
JS" (@\ ayemy) an Gl +> 
e te N te, a 
Lot y(n) = fry (n)s Wry vovy K (2-338) 
, p e%¢9 
giyooe 


denote the mmber of times that alternative k 4s used in state 4 in a 
sample of sige m. 1B n—vco at least ane of the states of the chain is 


impiy that, with probebiidty one, evesy state is ontered infinitely often. 
| ing strategy, v, Rinjccoag neo with 











a2G= 


probability one (ic s 4, aces Ka3 Lami, coop Ne The mean of the 


dictritation f°" (Bima) 19 F (nm) = CBE Cn) Jy whore 





~! £5 4(n) + i 
Bn) = = ; lon sve Ky (243439) 
va(n) + 1 Lo ghey cooy i 
Tle 2y ceo 


Thos, 4f Q = (qj ,], Lewn 2.3.6 inplies that, with probability one, 









a Pe y(n) B a > 06 “ot osep Ky (253040) 


§ #209 8 
Wo now shew that, as n-co , the probability mass of ream CAY (n)) 
tenis, with proteability ene, te sencentrate at Q. rf? ge 4s & vandon 


fix with the density function =~ etaind), the marginal variance 











ef oe is | 
AS 7 BE, (nd (1 = BE Cn] 
. fi (n) 2 
AJ yy") - *+Hei 
F ot -_ . $ pacer 6 (2.35.41) 
. vy (n) + H+ ..~ 
leek, eves + 
SeF@15 coogi! 
rei, @ eee 
THUS» wlth probability one, 
ve Nh ez 0. lord, cece K (2.7 2h? 
ne ) . 1, jah, ait 8 — 


Let e® > O and & (0 <8 <1) be given. Define the set £ <d 
— 


in equation (2.3632) ani let 


a 
Tt 


PL |G = Q] < e*\Hy(n)} fe ee) (2\4u(n) a8 (2.3043) 
(fas 








Sinee 
ig | g -Q| . 2" ‘ th iS | By tyl<e 
De Morgan's law yields 


C 
igh weal eed = Yel yey eet. 


where C denotes tho set cory 









t- PC |E-Q)< g]ymis 2 rc ey > eg Be" [9eC09 


a BAG 
(2.3/1) 


4,$=1, 6949 D 
avvinal. varianes of ie is 

ae : ¥ (KN) ., 

Vg gf) 2 Pi (p,,° - Pe (x )” ‘we (2 1qm)) dg 


. Ome Ky? Pi ti) 
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(2.3.49) 
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45 fi, ooegthl 
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Ry equations (2.3.40) and (2.3.42), there exists an integer n such that, 

for all n> n°, 


ae | wc 1 abe 4 Wehe cece K, (2030093 
—— (vy 4 . ym 4) /< ge Lo Jely coool 


= sith probability one. Thus, for n> n°, theinequality (7.3.44) besomes 
"7 . a 
= i. t- of |€ - Qi < «*|Min)] <4, n>n (2.3.50) 
| a | 
= eo? - / 
Pf Kd Q| <€ [M7 (n)]> Le 6, n>n (2.3.52) 
with probability ene. Since 6 is arbitrary, 


Man Pt (E- Ql < gt [mimi =1, (2.3.52) 
the | Mmit holding with protability one. 
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Again defining & as in (2.3.32) and letting £° be the complencnt of 


Bin “CR wo have, from equation (2.3.37)> 
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Let 6 . be the maximm of the <uatcaati function ae d( EM (a) on 
the cor ret set E°, Zquation (2.3.52) implice that ‘Me 6(m) = 0, with 
prowability one. Thus, 
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Thenrg, 222.2 Lot (P|) be an arthtrary peter distribution 
function of g. Let z, be a semple of size n ebtained from a Martmv 
Ghein with alternatives under the vestep sempling rule. Assume that, 
when tho systen 4a chserved in state 4, the samltne rulo is restricted 
to policies froaZ, — £ ami to transition intervals frou the finite 
ths {Yo reve» ,X 9 auch that as nro, Af state 4 ie observed 
infinitaly often, every poliey in on ona every tranaition interval 
are used infinitely often (45 1, ...9 NH). If Gy the true state of 
=" 


natire, ic e positive matrix, then for any ¢> 0, 
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the Mult holding with probability one, provided (PY) asatgns positive 
probabllity to the set 5 defined by eqaation (2.3.32). 
Brot. let E, be the total ranber of seared para, (So wv), where 
soa Mg and vely (41, coog N)p and let Ko . K,. When in state 4, let 
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kk dndez the possible polfgy ami transition interval. cenbinations, (Gi, v}- 
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Wee @ BD (o-) Wl, aeeg K (2.3256 
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and define thoK zNmtrics JI (arg). Glesriy, JI 40 a generals 
- Btochestio matein. If the index k corresponds to tho pair (Tl, ¥), let 
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e ye be the number of times 4 transitien ececurrod from state 2 te stato 
dj in the paulo x. S¥er the transition interval v when the systen was 
governed by the policy OZ. Then tho posterior distribution of e 4.8 
ACG | ts La where . e 
Tt Cay 89 acer) 
kerk, 
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where Qf (n) = [fy 4(n) + 1}. The proof of the theorea from this point 
4g ddentical to the proof of Theorem 2.3.8. Q.5.D. 





We remirs that tho asoumptions in Theorems 2.3.3 and 2.3.9 conoerming 
poliches which are used infinitely often are not restrictive. It will 
usually be poosible, after a finite amount of sampling, to eliminate from 
fur th or consideration those policies which are used enly a finite nusber 
of times. Exauples of such olimination of policies by dominance arguect 
will be given in Chapter 5. In any ease, the theorems apply to the 

wert nad distribution of those alternative rows of a which are obse:vorl 
q find toly often. 








lat ape be a fast ly of dietritutions indexed by Pek E which is 
closed wwler an arbitrary sampling rule. Somos general properties of 
‘are dorived in this section. The symbol £(x|@) will be used 














@33- 
throughout for the Iikelihoed of a saanle of sise n, conditional en 
f = G , wider the given sampling rule. 


¢ have @ discrete prier distribution, 





PC& = Gea, Gedy (24.4) 
inl, 2. oot 


B33 
where a, > 0, Ea, @ he For a fixed integor, m, let of be the 
inf 
femily of all such cCiserete Gistributions, indexed by g = (cy » Gag seep ae 
Then 9, 28 closed under all sexpling rules. 
Proof. let Z(x,|C ) be the Mkelihood function for en arbitrary 
If g* 4s the prior distribution of @, the posterior 


pn spat ity of @, is 















Ls, \Cq) 24° (2.162) 
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a," > 0 and Eo? 8 ts BPS (24% 9% > coon OP)aH 


This theoren, while almost trivial, 4s of conalderable importance 
‘for the solution of Bayeslen decision problems 4n a Markey chain in 
practice. In many oases it may be feasible to place positive probability 
on only a finite set of points of x,y oH to solve the correspaniing 
- ete problem, thus considerably simplifying the cor 
shell not enmphesise this consideration any further sines most of cur 
theoress ars stated in terms of Stieltjes integrals ami, henes, are 
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applicable te diserete, continuous, aryl mized prior distributions. 


Theanss 22 Lot MH = {NPY reE} ve w family of distributions 
closed wider a given sampling rule and, for a fixed policy ST, let 
= {Fe (RB iy)[ te £" , whore E'c EX , bo the corresponding 

feriiy of marginal distributions of the 8 x N stochastic matriz P(g). 

If the sompling rule is such that, for mel, 2, 2.0, it 4a possible to 
observe a sample of size n under the fixed policy ST, and if the Mkelibcod! 
of any sample observed under the poliay ST dees not depen en elements of 
@ not in RK) » then ae ie also closed under the given sampling rales. 
Let 2 (x, \@ ) be the likelihsed fimetion corresporsing to 
the etve sampling rule ani let 2 Aa IP P) be tho Likelihood of the 
gamplo x, from the Narkev chain governed by PioW). The hypotheses of tic 
¢theoren imply that 
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‘for all saxples x, from the chain governed ty P(a.). Let K, be tho 
rang » Set of the (K = N) 2H generalised stochastic matrin formed by 
deleting fren @ all rows By such that ks T (AF1, csog Ne Thom Lf 
Fee { 48) 46 the mereinal pricr distribution of aa ema af the samedc 
‘%, ioe observed, the posterior distribution of Ke is FL (e \t?, x Xds 
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for ell v's LY, where Y? 4a defined by equation (2.1.2). Time, 
PR (Bl ts Bde = end TH 49 closed under the sampling rule. 9.2.0. 





The next theore deals with the centimity of the expectation 
gcry= | ee anely) 

when regarded as a function of 7’, where g(@) 4a any integrable fusiien 
ef @. 

A distribution fmetion, H(P\1}, is said to be soutimens in tat 
o pottnt Pedy y ifs for any c > 0, there exists a &6 > 6 such that, fer 
ey fized +, “agi = Ke (¥*)|<e whenover [lt - rel] < 0. 
point, Coed, » 19 said to be a santimaty print of H(GIt) A¢ ACE| 1) 
iileletsesmes, fanstion of @ at eT, for any fixed value of } . 

























Defiritie;. Let Ft be a fonily of dlatribution fimetions indexed be 








an ats whenever algae ae) 4s a continucns function of fa aw 
each of 4ts contirmity points, e. 


Dheorms 214.3 let H boo fmily of distribution fmctions indass 
wy te TY which ic eontimous in Y and let g(¢ ) be any integrable 
function of @ defined on a sot 3a Kon If g(+) 4s a fumetion 

Y defined by integral 





ar) s f xP anely), te £ (2.465) 
Ss 


then (+) 42 contimsas on ¥ . 
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Pref. Let + ba fined and let {4,4 be any sequence of points o* 
{ which converges to 1, vhore fT, AY (net, 2, 000). Let HCCI.) 
be the eorresponding sequence of distribution functions fron HW . Since 
& 49 continuous in 1, 

















poe WE) = UCShh) (2.t08) 
Gt every contimity point of (I+) and, by the Helly-Brey Theoren, 
rd J ote rang | iets Jeg Jane |r). (2.4.57) 


ms, for every saquene Ces wdiich converges to }, 5 tb ett) 
and, therefore, e(t) 4s contanusus at f . QeieD. 





‘Gomsdlary 2.4 tot H = facely + e €t be a family of 
distribution Ametions indexed by *«E whieh is continous in + ani, 
for @ fixed policy % let o be the correspowiing family of uarginel 
Hotribxtion functions, F,.(P|t). Then Ay 1s 4 family of distritutions 
cont ames an 1. 

; Eek, In Theoren 2.4.3, lot g(f ) 3 i, Led, and, for £3 
yeu De wa Lat 


p 32 1@|Qe4, why 4 
Then equation (2.405) beams 
reigitye J atglt), (2th) 
and F o EIT) is a contimmous function of T at £ for any Pe d.° 








By (4, fi, even nk. (2.8.6) 
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- Cf.5 for exemple, Lodve [23], pp. 1000182. 
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cistrititions “mdesesd; ly A e Suppose that, fer over RPI de gy 9 & 
eorrespomling dencity fumetion bh(2I+) exists ami that n(C|'+) 45 a 
contimious function of ‘Y for evory Ged. Thon WN 49 e family of 
distributions continasus in ee 
Of integral calonlus which atates that, 4f K(QIt) Le contimovs in + 
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It 4s clear from equations (2.2.1) ami (2.2.2) that the matrix bets 
Genai.ty function, ion! 'g \M)» 4e a omtiroms function of the K x i 


. = ~ -_ * 
mateix 97. 
— ae 














CHAPTER 3 


| ADAPTIVE OCONTROL PROBLEMS 












assumed to operate indefinitely, is sample 
is, the decision-maker knows the state of the process after cach transition. 
Zz aformation about G as gathered in this mammer and the decision-maker 


= 


may alter the current poliey at any time, as dictated by his etate of 
a Ad 


knowledge about &. such a process is an 





It dis assuned that any sampling costs are inoluded in the transition 
reward matrix, Q 2 Cr, 3) « This implies that either the sampling costs 


ja 
are negligible when compared with the transition rewards or that tho 





process is operated in such a mamer that a sampling cost must be incurre 
after each transitien. Models in which the decislon-naker may close to 
Si nol © or not to sample will be considered in Chapter 5. 

When future rewards are discounted to a present value wa shall speal 
of a dis counted adentive control pmoesq. It is thie class of problens 
witleh whil be discussed, for the most part, in the present chapter. ‘The 





woen tw consecutive transitions is assumed te be conetan 





ean be taken as the time unit. Let B be the present value of a unit reard 
armed one uit of time in the future (0<f<1). Sineo the present value 

£ the maximm possible reward on the nth transition in the future 

deeras 20S as gm it 4s clear that the total discounted roward earned cvor 
on infinite peried unter any sequence of policies 1s finite. A natural 











=30° 
eviterion to uso in choosing policies is, thorefore, tho expected total 
dissounted reward over en infirdte peried and wo shall define the 
tive sautre) problen to be the probles of selecting a 
sequenes of policies so as te maximiac this quantity. 

In the present seotion the discounted edaptive control problea is 
formated in teras of a sot of eiumitaneons functional equations. It ic 
shown in the following sootion that there exists a unique bounded sot of 
 eentinuous solutions to these equetions. In Section 3.3 a method of 
_ gugcessive approximations is desoribed whieh converges monotonically 
amd uriformly to this unique set of solutions and the question of nolicy 




















convergence is eonsidered. The concept of recursive computation is 


then introduced and a numerical example is presented. The chapter 
vonelaies with a discussion of the probiens involved in treating 
umiiseounted adaptive contrel processes in a Marov chain. 





twwarned bandit problem--was treated by Helimen [7] in 1956, using 
wing and a beta prior distribution. The methed was 


Csazelino [13] applied Rellmen’s formulation of the two-arnc 
sot em to the case of a tioeatate Markov chain with tuo alternatives én 
each state, assuming a matrix beta prior distribution. He mapped dogi ric. 
‘regions in the paranster space of the prior distribution for the special 
ease of one urisow trensition probability veotor. Cosgolins, Gonzalec= 
aubteta, and Miller [44] have recently suggested various heuristic 
treatments of the disesunted adaptive control problem, basing their results 











colin 
om simlation studics. Freimer (18, 19] has ebteined a solution of ths 
diagounted adaptave control probles in the case of quadratic cost fumeticns 
by reducing the stechastic fommilation to a detenministic ons in terme of 
sertainty cequivalonts. 
The functional equations formuleted in this chapter coneralizs ths 
results of these authors and, in spirit, follow Bellman’s derivation [6]. 
Qur contribution to the treatment of this problem consists of the feldcuins: 
s&s Proof of the existence of @ unique bounded set of continous 
- golut ens to these funetional oquations. 
‘be Derivation of a method of successive 
converges monotonically and uniformly to this unique set of selutions. 
o. Introdvetion of remuraive computation techriiques for the numerics) 


















ale ence pe 8G t4ens wWiieh 








—— 

gointion of the discounted adeptive oontrol probles. 

Let the prior distritution of ¢ » Fil g\Y), be a member of a fasily, 
i , indexed ty Yet, The omlered pair, (4,4), where iol, ..., 8 and 


Ye» can be regarded as the generalized state of the aysten. llere, 


Sinoe the proceca Le to be sampled conscoutively, it mot be assumed that, 
i de closed under the comsecative sampling rule 4n order that we may 
neontng: wly refer to } es indexing the docielon-mker’s state of 

IeRO’ ledge as eampling progresses. 

Let v,(1) denote the eupromm of the expected discounted reward over 
an infinite period when the systen starts from the gmeralized state, 
(ts4). eR © TF Sek, \, tho discounted total reunrd under ary 
sgeapling strategy is tounded by 
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& PRs ¢ 
med Inf (3.403) 


and, therefore, v,({) oxiste for inl, ..., Naniall te EF, It will be 
vhown at the conclusion of this sectéion that v,() is attained under an 
4 sampling satratesy and, hemes, can bs fF 
expected discounted reward uhen the systema starts from (2,1). 

If, when in state (1, '{), 1¢ 40 Geelded te choose the kth alternative 
oan the system makes a transition to state j, the supremm of the postemer 


mated discemtead reward is 














. yg t Ov4(% 4 1D). (3e42) 


; pampLe cuteome Jj, unconditional with regard te the 
prior p distribution of @ , given that the systen is in state (1,7) and 


that altermative k 4a in use, is 
@ —_ - 
















cr) = é i anglt)s (368.3) 


the pay ginal prior expectation of 5° 





ak alae, “ ony e, 

a CY) SB & ee (Try, teed. g eeen Ky ¥3.2 223) 
i ei, e009 gr 

Te 

Jenot, @ the nean ene=step transition reward whea the aysten ig in state 

(let) and alternative k 4s used. Them, regarding each v4 ¢ Yt) asa 

disoounted expceted reward when starting from (4,1) mst satisfy the 


follesing set of cimaitensous functional equations, 


ws} ) 3 12s, SEC + 8 ps Be yt wor : (306.5) 


4m a scoe tl 
Pek. 








O<8<1 








eal} Dao 
We new conalder the existence of the naxlaun expected discounted 

reward over an infinite period. In order to do this, 4t is necessary te 
precisely define the notion of a sanpiame: stratecy for an adaptive contro). 
BOOS « 
let the policies, so” ch, to indexed by the integers 0 threagh J-1, 
% s J is the miaber of elements in Cf. Tus, Fe {Cs Cys aen9 Ee Wc 
Suppose the eysten starts from the generaliced state (1,5 1) ani that 
alternative k has been selested in state 2, We oan, before the Mrst 
trencition cecurs, decide witlch alternative to use in each state Jj for 

he second transition. ‘This consists of the choles of a policy, [a,, 
on can be denoted by d,(ijsk) © 0,, 9 function with range 
fo, Lgeeey goa} ‘ 
In general, before any transitions have occurred, we can preseribe a 
poliey to be used immediately after the nth transition (nei, 2, +.) 
vet * Bes c (15 As * i. ? be a possible sasple bisterr of the Airs? 
red 4 transitions and let 8.9 * (k, 7. Sag? oe? SO.9) be the semience of 
| policies wxler which tho GoMPLe Fs eoourred, together with J Ones 
golicy under which the nth trensition will cosur. ‘he policy history, 
Bnet > 16 determined ty evaluating the decision funetions 4, (4, sIc)¢ 


t oz 
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34s 84)s ary a moh enna? 2 ES) ed) at 3 Q* F2) ie (4 5? 4y)s eeeg 4 ne? 
or sven 2. m2)? Condi ticnal on the Wd cnain having arrived 
Ee Piste 4 ned with seunle history | and policy tistery 2 £3? WO RAY 
select an alternative for use in state j after the nth transitien for 
‘each of the F states to which the nth transition may bring the ayeten. 
Tris consists of the seleotion ef a policy, T- Gy? end £2 denated by 





a Bro? $04? ZG» o fonotion vith ranges $05 eees sea} e Shnee there 


_ sll | : 
“se =i «~ ~@e 
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col Fu 
AYe gird) different samplo historios, Bed? wich start from a Txed 
state ins i% is necessary to specify yond) yaiues ef tho ntheicvel 
deaision function a Kei? 34) « The specification of a complete set 
of decision functions, d(x ae Snags for wl, 25 39 00. and for all 
possible gampic histeries, together with the choice of en initial 
alternative in state ip, constitutes s gammling atrazery, d. Let D, 
denote the set of all possible sampling strategies when the systen star<a 
in stato i. 
Im the following theores 1% 25 show that V4 6T Do & Least upper 
hound, 4s attained fer some strategy ac, « This ia done ty mapping the 
space of strategies D, ento a compact subset of the real line and showing 
that the corresponding mapping of vq(++, a), the expected discounted 
F pare under strategy d, is contimous on this set. 















Gi Qeksk Lot v,( 1, a) bo the total expected discounted 
reward 4n a Markov chain with alternatives when the process starts frem tho 
generalized state (4, ') and the strategy d 40 used. Let 





mh) © dehy {eCt» ayy. (304.6) 
em there is a strategy a eDy guch that 
v(t) & WC a”). (3.4.7 





For ral, va 36 eee Lot x, &) ei Q° igs eee 4a no? Wits 





aE g tt eoog mh, aad, i; ecog The To eseh sequence %, let Fg: 
COS respond 5 the Neary ruanber 


rs | 
a(x) = = 48 hestieds” ee ae (3.2.8) 


Por fixed n and iy 2 4 we may then order the y(nek) aigferent nth=-level 
fedision fmeions ay i+ 8.4) 28 follows: ACs 30 Ss) <E (a o> Bs) 





cab gl ten 
if and only 4¢ 2(gig) < a(Z° ade 
Consider a strategy Ged, » Let the value of the jth masber of the 
(n3) 
f(a) & 0 dselsion functions d (2 ge 801) 4n d bs dermmted a nd 
(ri, 25 eee) and assume that the indexing is such that 
rei, 2» coe (322.9) 
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ni < “no < n(n) 





‘Tis con be done aime the (n) possible sample histories x, witch 


lead te the nth level deaielon functions ere all cistinet. Tho stratezy 
d can then be displayed as the ordered pair 

da (k,6) (3.1.16) 
where k is the initial alternative selected in state 1 ami & 1a the 
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§ 6 {4,42 a4? Soop bese 


& § bas S29 O49 asoh © (3.1.21) 
Lettir e A demote the set of all posaible sequences 6, wo have 


Wheres 18 do (kyS)y vyl ts kypS) & v, CT, 4). 
fo each 6c A let there correspend the Jeary number 


Co 
y{5) 2 = 6c a 1868., ever (304.53) 
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where 6. ¢ 2G, i, e209 Soft for axl, 25 e208 if 6 eet $0 50:0 p00. then 
96) 2 0 amd Af 6 = Pdel, Jel, Jol, voc} then 

(g04) 372 
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y(6) @ (et) ES ee 
oxal, 


Thus, for any cA, O<y(d)< 1. Moreover, 4% is easily seen that tho 
Rappine (3.4.43) 4s a one=te-one mapming of the set A ente the clesed 
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Interval [0,2]. Let (4 oy) be a firnetion defined on [0,41] by the relatian 
CY gS) © CHE eG). KEL vooy Ky (Feh-25) 
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Beds fe 
fhon (5.4.42) can be witten 
y(t) = Licey oye 3 wry) : (3.2.15) 


We now show that, for fixed k, mae: 4a continous in y. Let 


R® 44k + |e EAS - (3.4.4 78) 
P * tadok { a3 toh TG 
Let « > 0 be given and chooge a positive integer v such that 
a ty 
—eemeoase Ce rH 
prh Gabe (301-18) 


For a fizel ye[0,1] let y* be any mmber euch that O<y°<1 and 
ly sg} < ao, Then, 4° y = y(8) ond y® = y°(8*), wo have 


so a hs GML, 25 veeg V (304055) 
CY ar) = wy |< BBR 2) 
) qenyd 
“ow 
& a ne QE (3.4.29) 


Tins» i( ,( fey) 49 & continusus Sunstion of y en the coamict sot [0,1] ara 
oP eect na there exists ay * £03] such that 

Ace y,) s oe ts ‘ts yt . (3.4.24) 
Letfing $°(k) denote the inverse imge of y? = ¥,.(6°s 


v(T) asleK, haa) teyd Cie))} (3.1.22) 
ém) there exiats a otrategy, a” © (k*, 6°(k")), ouch that 
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y (1) oy a”). (3.4.22) 





In order to show the existence of a wide bounded set of omtimors 
solutions to the functional equations (5.%.5) we shall. make uge of the 
. mroamations. Let the functions Vq (ny Y) be 
Aired recurcively ac follows, 







ree, (n, 1 Eo 


mde cong N (35298) 


font YY og MR Sok “? 
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where V,(1) ie 6 set of bowed tersinal finctions. 
It will be convarderrt to introdues some additional notation. Let 


S*¢y, Y) 2 a(t) + 8 : a" : * ( ))» te 4 A305 
7g Von! % rs Pag os np Gy > Keath seog & (3.223) 
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Banation (3.2.1) may then be usi.tten 


¥, (ati, +) = Ties ‘ Ca Me b)¢ » $24, ceo, N 4005) 
a, sk, | xtreme 1} » = 5 i 
Ye ¥ 





» Aimlarly, equation (3.4.5) beanes 
Rak ; 5 6% 
¥Cr) & AsiceRy § Evy 8 2p ine Y (3.2.65 











oltre 
lesa waloh will be used dn subsequent proofs. 





fhe firat rest is a 


lemma, 2sf.4 If V de a tou for the terminal fimotiona V, (ts 












AAae) é Vp dehy oes Ho (Be207) 
ani ae 
. | 
R @& 4,4,t ih} ) (3-28) 


then the fonotions v,(n,‘{) are bounled, 


wo oa 
lime |< ove SSE ne (342.9) 
jai, °° 8 
=  ] @ ¢e8¢ 
¢f 
O<A<1 


Bmaf. The proof is imiuctive. Equation (3.2.9) obviously holds 
forn=0. Assume it holds for n. Then, if k =o mudmises the right 
gide ef A 3elolkds ‘ 

vy(ardy | < [RC d|* 8 2, Big) [esr TE Cr 


= ",* +B tp v+ fF Acie R°7 


o f 
| 4 = ml 
£3 gat V+ if Re (3.2610) 


Thensgn 3x2 If tho cot of functions fv,(a,t)} Ae defined vy 
equations (3.2.4) atyi (3.2.2), then the Mrite 
— vy, (as T) a w(t) = a (3.2.29) 

te X 

exis? axl { v4 ss is a sot of sclutions to equation (3.1.5). Norcover,; 
the convergence 4s uniform in i 
Peoat. it will be established intuctively that, for arbitrary 
tive integers, n ond mn, 





Pack, #5 
Ra . 4 ae! 


SS ee. - 


a se ae 


: °c? at. @ aw’ “alee 
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Ses ee 
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ee ch ee a hee 
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bP 6 


Yyiat)=ulat)| < (6 +P V+ “SoH ek’, (3.2642) 


4a, eeeg 8 
MelF, 1, Sy ove 
te P 





where VY is a bound on the terminal fumotions. Since 6<1<4, 1% then 
follows ty the Ouchy oratorten thet MZ. v, (not) exteta for del, ..65 1. 
That the limiting fimetions satisty (3.2.5) follows ty allowing n to go to 
coo 4n equation (3.2.1). Uniferm convergence of tho secquames of functions 
S y(n» 2} follows by noting that the bound (3.2.12) 40 andepandext of 
f. 
fo establish (3.2.12) we proceed as follows. Using the formmiation 

(3.2.5), for any firal Ye FE lot 

VglneY) = Sy (ve mi,t) 2 icy TAG net) 


Uy (ey tf) a 55 (vp mel, +) @ Late >a (v, ted. + y} ° 





vy (ts +) = vylnt) © Se (vy melyt) = lv, mk, Y ) 


< Sup mit) = Sup mete) 





ant, sioilariy, 
vy (ty t) = Myla t) > Blvp meet) = lv, aly 1). 
let k” andes the maxima of [of(v, net, +) = Sf(v, mt, + )| and 
Se (vs mel, 1) = 35 (vs mel» +) « Then . : 
Vg (ny tr) « v, (mm | < sf (v, moi, 1) « os (vy. med, i 
8 J 
vs(nel, ™ s(t)? ~ v,(ael, Te (Hn) ; 
42%, econ 3 (362649) 


mammal, 29 eee 
re kt 








als 
Ascuming that n> m, Lowm 3.2.1 implies the inequality 




















remy yy dO 5 12 
[vgComms tT) = Myst] < (LPM) Veen, (3.2688) 
An inductive argunent, using (3.2.13), shows thet 
nit fi ile p" @ 
[v(ap +) = (me t)| < (Bo +B) Ve “TIER. (3.2.85) 


A similar argument in the ease n < m ylelds (3.2.12). QED. 

Theoren 3.2.3 There exists a wiique set of bounded functions 

4 vy CF d which satisfies the set ef equations 

ye ME Satyr ep 5 RCH wat. (9.2.16) 
4 i<iccHy Qy tot Pay 4 J : <5 





431, coos N 

ospet 
Proef. Thoorem 3.2.2 established the existences of at least one set. 
of fimotions Sv,(){ watch cathaty (3.2.16). Lema 3.2.1 implies that 
“tis set of functions 4s bounded: 
PAce) ey | y(n, ¥)| < a ‘ AeA, wore 8 (562-07) 





To establish uniqueness, asaume that there exiat tam sets of bounded 
functions, fv,(+)} and fy,(+)} 5 witch satiety (3.2.16). For any 
i ond 6n arbitrary te F, lot 
vb) = Seger oH) 
; walt) Sed Se(uso7» 1). 
Then, arguing ag in the proof of Theorem 3.2.2, 
Bp (v9.2 Y) @ 85 (12900 oF) < v1) o wy OT) = SB, (wace ie ie Sa ktin (9 Y )» 
(3.2.26) 





Letting &” index the macinun of | Sylva» t) = S(useos4)| and 
[Sows 0 » 1} @ $5 (we » 1) E 
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aad 


v it (4)) =u, (ae aca), 


ui a 
w(t oat] <p Ee 
4=zh. #eep ‘ (3.2.10) 
tet 





Mnee v(t) and w(t) are both bounded functions of +, thore exists a 
immer, M> 0, such thet 
> i < Pe 9 ese eokee 
PAsem Ace) arty avon (342.20) 





Repeated application of (3.2.19) then yields the inequality 
| v4 6 r) ~ ¥ C+) Z ahs. PO sisleace (3.2.24) 


si, coun 


ver 










Since 0<6< 1, 4% then follows that 
Ve) & w, Cy )e {xi, eesg N (2,2,22) 
te 


Pores Becet If 2%, (4) 4s the uraque bounded set of Nimotions 
h satisfy equation (3.1.5), and 4f myth) ig a contimous function 
Of T (Kei, csos Ka$ Softly cers HN), then v, ( +) 4s a continuous fimeticr 
fT (401, wy Ne 

Bank, Consider the funotions Vq (ry 1) defined ty equations (3.2.1) 
A (3.2.2). Choosing © sat of terninal functions {v,()} cach neaber 
dich is contimous on E , 4¢ follows intustively that vlna 1) ds 
pontammous ($51, cevp Hg me, ty 25 cee De By Theorem 3.2.2 

{myinsy )}—> v, C+) uniformly and, therefore, v,(1) fe continous. 








= functions v, Ace defined by equations (3.2.1) ami (3.2.2) can 





a Slo 

be useil as suseessive approzinations in the numerical calculation of 
v, ( Y) a% aone fixed generalised state, (4,17). In this section we 
derive conditions under which the sequence of functions fv, (ny tI 

verges monotonteaily and firxl a bourd for the error of the nth 
approxiasant a, {My +) s % (4) e Vy (9 T)- The section concludes uith a 
£f that the optimal sampling strategy of the nestep problem definal 
by (3.2.1) and (3.2.2) converges to an optimal sampling strategy foe 
the infinite horison problen defined ty (3.4.5). 














Thepren 3.3.4 Let the termimi fmotiens of equatien (3.2.2) be 


Va(4) @ Vy. 491, veep BR (303082 
tet 
y" 3 ; v4} v —) mos 3 v, \ (3.3.2) 
_ - hr im , 
P “4 Feit 3 2 Rs ere Ae (3-963) 
VeBv <r (3.300) 


then, for i=l, .+e, N, the functions V, (ns {) defined by equations 
(3.2.2) and (3.2.2) form a monotone increasing sequence which converges 
uniformly to v,(')> the unique bounded solution of equation (3.1.5). 
‘Simtlarly, if 
| veo BV > R, (3.3.5) 
th en the funetions ¥q (no ') form = monotone decreasing sequence which 
converges wiiferaly to v4). 

Proek. We wild first show inductively that, 47 (9.3.4) holds, the 
sequences {vy (ot rt ape monetene inoreasing for each 122, cee, 3! 








°5 2a 
anieach *c¢f. Uniforn convergence of {vy (rs + yf to vy('T) has 
Already been demonstrated. If equation (3.3.4) 49 satisfied, then 


N 
v, (is Tj - vq (0, Y)s eke, 5 (+) +68 es a 6 yy vit =¥, 










>y + By’ o V 20. art, go (3.326) 
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Assume that ¥, (ip +) = V, (red, t ) > 0 for 4m, oeey Naxi tcf. There 
3: § 
vs(n + 4,7) & S, (vpn, +) 


Vq (Tig vy) = 33 (v, red, 1), 
the industive hypothesis implies thet 
vs (nti, +) - vy (ny ¥) = Sy (wat ¥) - si (vs nei, +) 


> SF (ven Th - 35 (ve nel, ¥) 


8 of 
= 8 ~ Py s(t) Cv (ny 7 yo) a ae 4000 
>0s ALy cceg H (30307) 
- . tek 
proving the indiustien. If (3.3.5) holds, 
vy(ie¥) = w(t) < R+ BV ov <0. (3.4.8) 
4@i, se0g N 
tre 


‘That v,{nti, 1) < v(m) ta then easily established by induction in 
2 mamer similar to that diaplayed in equation (3.3.7). QelieDe 


Lat the errer of the nth approximant be defined as 
@, (ne Y) = vg (4 ) = ¥q (rie 1). es esong i (3.3.2, 


g f > e286 


re & 





Sad 
if Lv, (20 48 a sequence of fimetions which converges monotenteally 
to v(t) then 40, (1% ryt 43 a sequence of functions which converges 
monotorfealiy te sero. In this caso, 1f > 6 4s an orrorebournd which 
4s acesptablle to the decigionemaker and if ren is the emallest pesitave 
integer such that 














Jogtayt)| <ez (3.3.30) 
then v,(n°, 1) is an acceptable approximation to ¥,(+) and the samiding 
strategy resulting in v,(n", Y) 4s an acceptable apprezimation to the 
first n levels of an optimal saspling strategy. 
it is mot necessary to require that the successive approximants 
Vy (ty 1) converge monoterically to v, ¢ 4+) and, in fact, s noremonoteric 
= quence 2 vq (re yh may converge more rapidly than es monetomhle sequence. 
Theoren 3.3.3 provides a bound for @, (nyt) assuming nothing about 
nonotemicity. A leama io Mirst required. 


lama, 3.3.2 Let e and & be defined ty eqmation (9.3.3). Then 
%(4)s the solution of (3.1.5), has the bounds 


- alee cy (4) < atin $x N (4.3.42) 
of ~ “S — 8 ° 9 seeg BH Ao jebks 
def ~ “4 ief tae 
<Be4 
Proof. The mean reward per transition under cny policy has the 
e<B(yye nr feel, coey Ky (303032) 
42 oo eoeg Ke 
ree 


Since the eapacted discounted rewanl over en infinite period under ery 
strategy is the sum of the expocted rewards at each transition, tho 
Poverd of the nth traneition baling weighted by P", the meximm total 
Femrd over ali strategies has the bounds 








«Sipe 


et SPev(tjysrn =e Bp, dai, ssoy NH (303012) 
re oF rex) ‘te 


fron which (33-44) follows. QED. 


Theory A334 Lot Yv, (nyt) de a eoquence of suosesaive 
approximations defined by equations (3.2.1) and (3.2.2), with constant 













Gm, coeg HB (3s 50k} 
':2 


Let v's V's ry aml R be defined ty (3.3.2) and (323.3). Then the error 
ef the nth appromimant has the bourei 


Jest] < 2 (eax {5h5 =o ve 755). (323-15) 


4233, coon BN 
eon, t Zs eeec 


O<f<2 





Proof. The peaoLl $a drituectivo. For 2D, 
0, (OF) = v(t) _ “4° 


Sax 2989 ¥, ( ) > Vie 


$ Tren, by Leanna 303025 


| ag (0049 | Cy AS@ af Vg < ggev Vie 309 essen By (23,3085; 
Yet 
if, on the other hand, vi (4) < V,,5 lemm 3.3.7 implies 
—_ 
oe 
'€)) ) ayo ¥ (Vy < Ve ® 421, coog H (3030273 
|e , | i & i hoe ‘ 
Tharefore, in either cage, 


She's oe Yo : x GPL. secg BF (3.3.23 } 
iiss o ee | 





[ey (0s ) < marx 


49 to be mted that at least ene ef the to terns, tse? 


wine » ic romnegative, for, if not, ws have the oontradiction 


Ie 
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inp ° 
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e s 4 
vi<ngis <g<wcr. 
Heving established that equation (3.3.45) is valid for m0, assume it 
holds forn. Let 






v(t) S Se (Voy r) 
Vq (nrd, +) = S) (vers +), 
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Sh (v9oes Tee sy(vint) < o, (med, ) < Solvers) = 8 ACh her (3.3.89) 


Bae E Sndex the naviow of | st (wor et) og rtven, 1) | and 
; } (0,004 1) © f (vets +)| . Then 








“ e 
Jo, (atte = oH (790091) © Se(vens H| 
<@ hay e 
SBE BSCE) [osins ier | 
= peat [may + ad wv v - i} de 


(3.3000) 


Ary 3.3. Let the terminal functions ¥,¢ +) be constants, 


AAS 2, & Va 4=i, eens SI 3.3.24) 
7 ek 
m Gt 
Vefhiv < Ps (30302) 
the error cf the nth approximant hae the tounds 
O0<a(nt) < BX ted "). (353225) 
4 is 422, wecy Ui 
WO gh y2eee. 
a el a 
0s8<2 


oe y Side ORFS PAsy ‘ 4¢ 
= 6 








a o 
YY, «@ BY ~ Re (5,3 Bud 3) 
| @, (ny 1) hag tho bouris 
& 
ae ¢ Ts a ff ? Ko, (Bp ) = 0. 4=%, eoeg YH (3-3.25} 
Mod segoee 
Vs 
O<p< 



















Proof. if equation (3.3.22) 4s eaticwied Theorem 3.3.1 implies that 
@, (Mp *) 20. HMorcover, (3.3.22) iaplies that ¥ (<8) < RF and V (128) < fs 
hence, that qzizev > Vv" = yy. myustion (3.9.15) then yiclds the 
upper inequality of (3.3.23). Similarly, 4£ equation (3.3.24) 1s 
satisfied, then (inp) > rand v'(ieG)> R, hence Vo = ghee gy =v". 
The bounds of (3.3.25) then follow frown Theoren 3.3.1 and squation 


let there correspond to the nth appromimant v v4 (ny 1)» defined by 
(3.2.4) and (3.2.2), the nestop optimal sampling strategy d'(n). At 
least ene such optinal strategy exists alnce there are a finite number 
of different sampling strategies for the nestep problens there may be 
nore than one nestep optimal strategy. Tho next theoren densnatrates 

; chat Q5 N->0o » any Meatep optimal strategy converges to en optimal 
samplinc stratecy for the adaptive control mdel of equation (3.1.5). 
We must first precisely define what is meant by convergence ef a 
sampling strategy. 

let tho generalized state (1,1) be fined. To every nestep 
samplinc strategy d(n) there correspends en ordered pair CH, ds there 
ke a... m4 and y_cl0,2] 46 defined by equation (3.1.43) with 
a8 9 fer a Ss i (ed) let df = (k*, y*} bo a sampling strategy fe= 


the infinite herigen model of (3.1.5). Then wo say — d(n) = as 








on Pcs 
32%, given ¢ > 0, there Is a positive intoszer v such that, Zor ali n> wv, 
ik, sk? and ly, y'| < ce This definition implies that, given an 
ertatwerily large positive integer , there exlets en integer v such 
that, for ail n> v, the valnes of the decision funstions on the first 
ZA Levels of A(n} are equal to the values of the decision fimotione on 
the first A levels of d*. 














er 3.3.5 Let the gemoraliced state (4,4) be fazed and let 
a =< D, demote the set of optimal sampling strategies for the adaptive 
control problen of equation (3.1.5). if tn) 45 an neotep optinc?) 
ling etrategy for the problem defined by (3.2.1) (3.2.2), than 


me, a(n) od (3.3.26) 





exists axrideA. 
gnook. let ik dancte the initial altermtive seleated in the 
nestep optinal sampling etrategy din) and Bet X, be the set of 
alternatives 4n state 4 whlch are initial selections for an eptinal 
strategy in A . Wo firot chow that 500, i, @ kek. 

Uaing the notation of equations (3-2.3) ami (3.2.4), 
vite) = SMvy ned, 1), (3.3.27) 
and, for any keX, and aghg» 
VCH) @ wm 0F)> Blvgcog th). (e288) 
Asse that Lk, § Goss not converge to 4 mazber ef Ky » Then there 
existe a subsequence {k, \ such that 





Kn, ¢ ys VE» 2» #99 (303629) 
Let ¢ be chogen such that 


a a 3 we we eh 
O< ed S é Ky Sect) al Syl¥9 9) a (3630302 











a 
Lim 
Since \P~» 00 Uy lays ¥) - v9 9s WO RAVE, by (303027) y 
JyCr) = 3% (nett) | <3 (363438) 
for ali v entAl o£ ently LOPS PIS ueing (3.2.3) ema (3.205)¢ 













na Sly, rel, t ) c Srtmyo6 +) kad, eee9 Rs (9.32%) 


amd there oxists an antezer v such that 





xv, nels ) ea seu(vy@ yt) < . ® (3.3.99) 
Tims, combining (3.3.31) ond (3.3.33), there exists an integer v suah 
that Iq, ¢ %, and 





y(t) = Se u9 ong +) < ey (3.363%) 
contradicting (3.3.30). It follows tet M2 k = & extets and tint 
Ike» 

Given & positive inteser , the same proof ean be applied to each 
selection of an alternative in the first levels of the sampling 
strategy d°(n). Since, heving fixed  , there are a finite number of 
wien @lternatives, there exists a pogitive integer v such that for ali. 
mn vy» tho decision Mmetions in the first levels of d'(n) have tho 











—-* whieh appear throughout this repert. Tt is te be noted imt, 
Githough these equations resanble a clageleal iterative forma for 
seeecarive approximtions, v(m») de computed, not in terns of 
vi(tet, 1), bat in tems of v,(ne4, Ty 46). Computation of v, (nm, +) 


for & specitie value of (4,1) Snvolves the evaluation of betwoon 








o Spe 
(1,0) 4) ard tr) terminal values V,( 4%), whore kyo “E72 Ft 
andk,o “ein, 

Cne way to esmmte Vg (Be *) ia to start by evaluating and storing all. 
reriired values of ver) S va(O0T Is thes te commute and store all 
regiuired values of Valls tT), using the results of the previcus aonpitetien 
of ¥4(0, 1}. Im general, for v @ 2, 2, coop med, Valve» t+) 4s commute 
4n terms of 8 grid of value of Valrode T) anil is stored for use at the 
nest stage in computing v,(urls 1). 

. Sinoe the number of temtinal values V,(\4") uiileh are necded gras 
exponentially with np 1% is clear that considerable storage capacity is 
reguired. For even mderately large n, taps or dise storage mist be 
used, Moreover, a fairly complet dmilesing routine met bo programex! in 
order to utiiise core mamry efficiently. 

an alternative approach 4a to evaluate v Ato +) recursively. Using 

tis mathod, computation starts with the nth level rather than the gorce’)) 
in geperal, the routine, at the (v + i}th level of computation, 
starts to evaluate valves Y) for some pair (4,4). This level oi 
sompatetion is suspended when a value at the vth level, Valve 1%}, de 
required. Certain key portions of the (vt2)th level of computation aro 
etored on a pushedown List ari the routine them calls itself, entering t:. 
wth Level of computation to evaluate Valvs T°). Rectimsion ie halted o% 
the gesoeth level when ie © 40 comated. The reathte of lewor level 
qaputations are then fed back, in suecesallon, te higher levels, Havin: 
obtained the value of v. 62 1°) in this mumer, the (vt1)th Level of 
conpatation reclaims * partielly completed caleulations from the pusik 
dam lst and completes then. This suncossion of events continues wmtil 
Yq (my ') As evainated. 






































oli 
The storage veceleemente for recursive ealq@lation of ¥, (mg T) oonsiat 
only of the space necied for storage of Intermediate coumiatlons on the 
pushb=-dsemn List and, therafere, inarsage linearly with nm. Tims, the 
recuraive methed has the aivantese of requiring condiderably less sterace 
than tho first method deaeribad. Smee specifie values of T4(ve 1), foe 
2 Oy be coos ek, may havo te be recaloulated many times in tio 
pecurcive mothsd, wo are essentially trading ewmming time for storage. It 
sixald be noted, however, that 4° the Mroet methed requires taps handling, 
the recorsive mothed my reduce overali surming tine. 
Tho general theory of recarsive commtation 4s desaribed by MoCartiny 
(29]. Programing lencuazes of tho ALCOL fentiy £32] are capable of 
e0urcivo computations &@s are tost list preeeesing Janguages; 1% is 
possible to do recursive programming in PORTRANTI(4). Tho recursive 
ime woh wore written for this report used the MAD langunge [3]. 
Wilising the recureive method, a progren was weitten to evalnate 
equations (3.2.4) end (3.2.2) for specific pairs (4,1) when ¢ has the 
meteiz beta distritmtion. Tis progran is contained an Appendix 5. 
morfieal results obtained from the program are presented in the 






























state. Lot the ramrd matrix be 


Ge ~ ; hh F 23 i, (3eGek3 
ay 3 le a X 
§& 46 km 2 = 


Oe 


Aseane that the prior distsibution of Ais 6 mtriz beta distribaticn 








with osranstar 


Mm = 10.0781 6.0370 (3.502) 
S 4.0076 0.3368 
005586 


309099 
0.2888 0.2492 









ge ne GC? 0.35 (365 03D 
0.799 SPAS 9) 
0.125 0 e375 
0.625 0.9575 
i afl we My ado-Re 
po Cry Paps Bee Pyoe Dag © Pons oe? Teo) 
p oe § CBl» 
© wartanoencovardancs netriss of this Gsteiikitéon is 
i. (= BS ~ Bs (3eS0i) 
ab eo = 
| 
af} OOD 6,080 
0.029 29 G20 
«(Oc Pee ¥) 
0.489 030 
of} 9 LEG Gok | 
tet, the disoamnt factor be B a 0.2. 
Tahie 3.5.4 iste values of 
ing 7) o Mow (30505) 
(rig 14) 


uging the teminal functions 
Vy (WG #020006 AG 2s (3.5063 








of2= 


lay) | Zn) : amW) | Ala) 
) 2006 metal 23.667 
8.002 4 e585 & O60 
10.999 2 2668 
20 ets? | O75 0.8) 
13,412 2 0.555 
10.445 i 0.699 6.169 
23.962 2 0.205 
10 499 i 0.038 0.037 
13.646 2 0.021 
16.528 ¥ 0,003 0.2006 
13.66% 2 0.963 
40.517 i 0.000 9.604 
13.667 2 0.000 


| : 0.000 
5 mirmtes. 
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Computation time: § minutes. 


Takia A502 


A(n) 
o-"(n) alr, WZ) a eee ain 
ne ee A ee = 
4 ee ve el = ? 
| 30738 , 9.987 
36.730 . 
i 1.765 a 
; 13,969 : i 
, 0.650 
0325 
10.192 . onto 
F 13.252 3 ee 
‘ 0.069 ; 
- 0026 
“ 0.622 8 
ds Bite 9 6.045 
ae 0.005 
002 
; 91002 
ge 
0-002 
2000 
i fee 
- : Soy oe 
a a 
7 6 e& o.2 











s(n, 7M ) T(r) 
10.253 ecoeoesens 
23.278 ootonaes 
10.258 4 
13.276 2 
26 £70 2 
43.986 2 
10.507 q 
13.652 2 
10.546 4 
13.665 2 
49.537 4 
13.667 2 
40.518 i 
13.668 2 


Compute thon tine: § mimtea. 


aln In } 
a --_ 
a 
. ‘ Buena Ook Mareen? SoS B abies To. 


0.265 
6-356 


0.26% 
G.392 


0,028 
0.082 


0.042 
0.026 


9.003 


6.002 
©.002 


0.000 
9 e030 


HM) 2 





0.678 


9.016 


0.003 


0.002 


10.253 
13.278 








@GSe 
Sines ra 3> G6, the converrence ia mastone inereisimg. F446 ecen that 
gonvergence to tu decimal places hss oacurred by the sixth iteration. 
The optimal initael policy veetor, 
r*(n) 2 head (30507) 
T(z) 
4g recorded 4n the third colum, vhero ©) (n) do the initial desision, 
Ke wien the aystes aterts from state 4 and the mestep optim semmline 
strategy is d{n) = (k» 6(n)). 
Using u(6,21) as the liniting vector x(2(), the erver veotor of 
the nth iterate, a(n, 2), was computed as 
alr, 2) = ied © winsMt) (36568) 
43,667 } 





and is displayed in the fourth solum of Table 3.5.4. Tho lest elim of 
the toble comtains the error bow; 


: e 8 
A (a) op" mx xs vy Yo ist 1 
defined by equation (3.3.25). Mote that the bound, A(n), accurately 
predicto=--in this Semplexthe mmber of Iterations regiired for tem=pi.cex 














Tne computations shown: in Tehie 3.5.2 ars similar te those of Thhic 
9.5.4, ancopt that the terminal fimetions are 


Vi) 2 ge = 3.756 §a8 2 (34509) 
The convergence is still monstone inoreasing and the omrer fimotions, 





Oy (MF) » aro reduced to approximately 2/3 of the correapanting valtes in 
Table 3.5.4. Rive iterations are required in this mae for tm-place 


in Table 3.5.3) the terminal Simections ara the maxim canecte: 
Becemial rewsaris when the ie operated indefindfely under a aiocle 








ale 
midoy (in this eace, the poliar (1,2)¢ of. Section 4.3), 
MG) = 33-233] ‘ 


a 


Gonvergence is monotenic after the first iteration. Tho emvur vestar is 
migiGioantiy redused as compared ulth corresponding entries in Tabice 
350k OFsl 3.562, Four iterations are necessary to obtein tmeplace 
aecuracy in this instance. 
















When the discount <ucoter isa unity tho oriterion of maximising the 
foetal expected revaed over an infirate poried 4a mo lonper useful aines, 
With the posalble exaeption of a set of sample Matories of meas 
the expected reward ever an infinite parlod undor any strategy diverges 
tote op 0, An alternative criterion is to maximis 

rate 6t wich the prooses earns rewards in the steaty state, or the 
sxmpested gain of the preeess. Bat this criterion is mt really precics 

fs eo the dociclem=-maker can, in the adaptive control process, chance 
alternatives in eny state at any time, amd 4% 4s not certain that a 

Steaty state whl). evor be reachsd. Morcover, amone those stretegios whicr 
@ Ioad to a steedy state and which maximise the gain, there are an 
sridtromily large number witch are virtually equivelent-=those strateg!os 
4m wich each alternative 4e sampled a large (but finite) number of tines, 
then o fiwed poiiay is chosen wer alaoost perfect information. Rarther 
renarks on this olass of strategies will be mule in Sesion 5.5. 

Sines, for each 6c(O,4), thore 4s a wolledefined orhterlon leadin: 

te en optimal polday, an alternative approach to the uncdisenmtad procecs 
8 % ict 84 fn the discounted adaptive control probilen as formated 
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axffJoo 
§m equation (9.4.5). Por Masd B. let CXH) a Co (Bs anny o_{89) be 
an eptineld initial policy, where o, (8) 43 the madmiaing value of ik in 
equation (3.4.5) fer a fined Te &. We shad call 9-(8) a 
Baiiey. If, for sone &e(Ood)s 













Spe", to s<pet (3.604) 
we chal]. cali S-” an optimal inktial polly for the undiseounted adaptive 
“3 trol problem. The etistewe and nature of optimal nolicies as 
defines ty (3.6.1) are matters for future investigation. Hlacissll [44] 
amd Derean{i6] have used thie approach to uxiacuted deciaion problene 
An a Markov chain with alternatives when the trensitien probabilities are 








CHAPTER & 
EXPECTED STEADYeSTATS PROBAPTLITIES 
AND RELATSD QUANTITIES 

















Consider a Markov chain with alternatives which is operated under a 
fixed policy, &. let (G2) demte the N x N matrix of transition 
prooabilities, assumed to have the prior distribution F (Pe \}). 

Ta this chapter we exawine some fimetiona ef z which are of importance in 
deek len problems, with particular attention deveted te the probles of 
computing the mesns, varianees, am cevarlances of these quantities. 
Section 4.14 deals with the nestep transition probabilities and with 
the expected discounted reward over n transition. The second sectien is 
ceneerned vith the eteadyestate probability weeter. In Seetion 4.3 wo 
= der the expected cisceunted reward over en infinite mumber of 
transitions when a fixed policy, FU, is used and, 4n the final eection, 
sone Fresvits coneerming the expeeted roward per transition, or process gain, 
are presented. These quantities are, of course, important on their own 
merits; tho results derived here wlll be applied to various termine). 
control models in the folloving chapter. 





throughout this chapter 1¢ will be assumed that a specific polley. 
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7, is im fovee and that the Harkey ehaln is governed by the N x N 











@ Re Ads, ‘ a 7 P 
stechastic rateix, Ra BC a) and that tho matrix of tranaition roward 
Hojo Re Cr, yle In most cases, the dependence of various functiens on 


oa, 
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S- 241] act be made explicht in order to simplify the notation to seas 
aN ‘ ent 3 











%.1 the Bsshep 3 
if P is @ stechastie matrix governing the transitiens in s Markov 
ehain, then the probability that the system is in state j efter n transitions, 


given that the systes started in state 1, is the (4,4)th element of the 


nth power of Ps ami 49 denoted Pye When Pe is 4 random matrix Hy 40 
& random variable. In this section we derive expressions fer the omeated 













> of ay and the coveriance of me ard p ca Y » and examine a related 
qu —-: the expected discounted oad over n ao aa, Stlver o using 
eh motheds, has considered the expeeted value of me: rf assuming 
ated bota prhor distritution for is and has presented numerical results 


& ‘tor a ta-atate process. 


Theor Y4atsk. If the prior distribution function of Pie MP|1)<A; 
af 








af feet ty of cistritutions elesed under consecutive sampling, a 


BY) = fry pp arCeLY) 4,381, coop B (4.401) 
a fe ¥ 
7 WAL 2s eee 
12 the expected value of Si then an oan be computed recursively 


frer n the following equations: 


i 
ar) © z, Ni gD) Bgl Is Aedes ooey BF (lbe4.25) 


NEL Zeeve 
YcE 
aay) © B, (4) Ly Gly coop H (40d.20) 
a3 ag re ¥ 
Sineey Lor WE s2yecey 
(m1) i (a) 
Pay ee Mae Peg? bed, 


“(93], pe. B87. 
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o(med ) Pa Ld a? 
Bt) fi #404) 
Pres B \%, 


z Be) 
~- (2 5° 4)? Pes! 1). (4.4.3) 










nee pt 43 a gontinusus finotion of 2 for Lp fPl, coos N ard 


3 
TESA, 2» seag tae integrais AB (4.2.1) and (4.2.3) GELSTe QelieD. 


Thearen YnlsZ If the prior dlotribution functions of Pie M(P\Y er. 
faxtly of distributions continuous in Y , then for ig jal, sees 8 and 
net. 2, aeeg Bee) is a continucus fimection of < 


Proof. Tne theorem fellows directly from Theoren 2.4.3. 4.82). 


Trearen 2.4.3 rf F has the distsibution function F(P\f je = 
: family of distributions closed under conecentive campling, anc if 





cH Bye fof) planet. Yet) 
at aBeved — soeg Ii 
ae. : 9 “9 eee 


then, for n>i,v> i, 
B oR Be ire 2 Ey Aap? Bug ag ¥2) 


1 ieee lee CCC 2S) CO 


while » for ri or vel, 
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BCBS LY 1 Bg1) BOG Cr) (3.8.6) 








a7 fo 
eth E |r ]e Ren (4d) Bylh)- (2508.67) 


Frans. The finotions ey exe comtinesus on 259 hemes, the 
integrals (4.1.5) agiet. Applying Lemma 2.3.2 tulee, wa chtain, for the 



















GOE® %3 > 25 > dy 


BD [de 2, = 5 ry Pe Pag SRIY) 


at) BRD) BMD | nce cr). 


od ae gi Pe ety) are (4) 


ab > 1) Par) 
ena similarly for = Ze QeieDe 


x Let us ne concider (ast }, the prior expected 4 

fn tranci tions when the systen starts in state 4 end neh) 4s the 

eng’ nN. R prior distribution fimotion of Pe Take expectation wilh be 
a... one of the terminal control mdels of Gapter §. Let a” (B5P) 
© the corresponding expected desounted reward given that Pe P, Then 


ean )s Z* iat T pe (8.3.8) 





mstion ef P is ee shy 


: 4 ax empling, then 3¢p, 1) 
— coobetobll from the folloning — 
Be. 2 5 By lt) Or, + 0p, 2.4993 (5.90) 
J jes f iy 43s 


Lah, aoeg 8 


Weteeces 








Poe 


8 
att), js te © ag G24, secs (2363..9%} 
oy Be teal Pate ) Ate to E & a=4tg 
0<8<1 





here = Cr, 4] 49 the ramied matris. 

Exmpof. Vor mei, 2, eee arm all Peds a” (BoB) satisfies the 
ng renem. equation, 

(8,206 = Py, Fay, + B an” (BoP). MA, sooy H 
















TRON, valine Lemna Ce3eey 


- N 
Mey ye F Byrd f fey, +8 af coer] arte lay CF) 
he 


H 
i 2 oy o{v) , fa 
ee Dy, V2 (r, taht! (Ba TCP] (444420) 


wtich is (4.1.90). Since ap, 7) de a, ( Y) a9 defined by equation 
(Bele), equation (4.1.9b) follows. Q.BeD. 





For the ease f = 1 another meted is available for evaluating 
a0) 4, > » 226 aaanle of n observations ist £, j be the nunaber of 
transitions observed from state 4 to state § and let re (f, gis ari 
BAN motraxz, bo the tranaition count of the sample. Prior te the 
pation of the sample F ie a ramion matrix and, given the int tial 
state 1, the mmbor of transitions n, ani the prior distr] 
> ean find the distribution of Es tneonlitional with regard tof. ist 
pe ( ,] bo the mesn of this mounditions 
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a (ae YD m z i) Ea? a: (%ohedt) 


I€ the prior distribution of P is the matrix beta distritation, thon the 
Hotribution of b wuncomaitionel with regard to Be is the teta- Wid tiice 
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digteiimtion, whica fe diseased im Seation 6.5. 


4.2 The SteadyState 

Let B be on ergedie stechastio matrix. Then there ia associ 
Ba unique vector of steadpstate probabilities, 7 (P) BCI cesy Tide 
where uy ra W,{P) is the stendyestate probe ity that the aystes is in 
etate 4(4 = 4, woop N)e The vector 7 astiafiles the following ayate: 
of simltancous equations, 






















ee ae (4.2. fa) 


2. url a 4d, (ls420Rb) 
q=4 


if P As 6 renion matrix with an arbitrary distribution function, 

MP|1), which satisfies a mild contimity contition, we show below that 

the subees of nomerpadic matrices in 4 is a scot of measure sora. Thus, 
ful. to speak of tho random veotor Jr. 

| corned, in this section, with the axpeste!! value, 
(+) 9 CTCY)s soon WC). Of IL. It de sham that this expectation 
wists end that a ms a ce) a 7, YT). We then assume that F(P\Y je U's 
ly of distribations closed under the eonsesutive sempling rule, and 
arive functional. equation for (1). Hetheds of successive epproxinations 


1 mented, We conclude whith a diseusetion ef the covariance of 1 am 3 











Lot us now consider com¥tions oa the prior distribation of P which 
ineure tho existence of the general Joint moment of the elenants of 27, 








Tite 
H N 
EC ll Wa{vie [i Ww, t ap(p\*). te F  (t.2.2) 
403 423 ~ 


Where ths ‘y are romegative integers. 
lat 
gs ‘2 | Ped.» O<p,.<1 (4,J08 my (4.2.3) 
83 ale’ oy Pay ay ap eeen ele. 
be the set of all positive stochastic matrices and, for 0<a<1, dofine 


J ~ i2| bed ag 3° loa, (4, 4ei, oerg wi © O<a<} (4,26) 
















Ve remaric that J," 48 @ closed and bounded, henoe, compact, subsot of 


Be and that SgoQcd, for any a in the open interval (0,4). 


For fixed ac(0,1), let S(a) be any subset of E,2 such that 
tf « | Ta 
di de # S(c) (4.2.58) 


by ede sla). (44.2250) 
Ths, for all ae(G,1}, the boundary of Be 2 4s a proper aubset of 3(«). 
If, Ja sone ac(0,8), there exists a set S(c) satisfying (4.2.5) euch 
patel ), the prior distritmtion function af B, 46 continuous on 
than RP |Y) is said to be sontiunsus acon atmoares ty 

4s a fantly of distributions indexed ty / every meiber of which is 
dary of ye then 7 4s aleo caid te be contimens 








‘The following lemma shows that contimity of F(P(Y) on the boundary 
= 0 7 
of 4 x is a necessary and enfficient eoniition fer the set of boundary 
ned nto, a a? a a to bo @ set of measure sero. If Ped of then P 


wnealists of a single chain with no trengient states. ‘Tra, the subset oF 
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od bend 
4, whioh inelwics alll perledic ext miltiple-chein trendition matricos, 
ae well as those single-chain transition matrices which have tranaiont 
estates, is contained withts a - 2%. The inport of Lema 4.2.2 io 
thet, provided F(P|'Y) 4s contirmous on the boundary of 2), wo need! 
Gey constiter treneition matelows An A. In this case, wath 


sobabiiity one, TT existe and, ore TT, > 0 ($28, woe Be 


Jemma BeZah If F(P\Y) 40 the prior Gistribution function of 2, 
then @ neoescary and enfficient eoniition that 4 © a ® be a cot of 
measure sero relative to the por distribution ds that F(P\Y) be 
contimnens on the boundary of 4 

Bef. For all ce(0,i), define the sete 

















C, 4) te yz | Peds 0<P4= at. i» fad, eeeg N (3.205) 


Q<a<72 


| | H 
ff ® or < fee 207) 
B,- AoA = By ss, Gyre Oe) Aten 


and, for all ac(6,1), the 
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2 [ exe \4)- {(322.8) 
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J ang |s) z 
aut je C,, 8) 


Ou the Gca<i 


4? (+) 46 the mrginel Usteibation funct! 





On of D 6 Ci xas2 
34 


faxey i 7 fal 4.5425 eeeg 3.2693 
Q< acd 


C,. s(x) 


Aseuas tat, for fimed ce(0,4), there exists a sot S{a) satiatying (4.26!) 
on which F(P|1) Lo contimas. Let <> 0 be given. Sines RP\Y) de 
contiimnus on 3(a) we may choece on a® such that 6<a'< a ani 
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J ane |r) < <i & A sonra < é (2-268) 
Lu-4e ® isi jp 


oe . Sincd ¢ lo artitrany, a4 . © 6. 4s & aet of moe 
evfficlenay. 

t demnstrate necessity 1¢ saffices to note timt, 1f there doas 
e t exist a set S(a) which satisfies the contitions of the lemma, then 
“gy Pit) mst asaign pocktive probability to at least ens of the boundary 
podnts of v4 e QobeD. 
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We remark that, dn the case of the matein bota dictribation, the 







< cn < ain seciatoad, integers Mb If, Supthermore, eel Je ae ry 
‘Pee mtv y of cisteibations continuss in Y, then & Li #, "IY } Sea 
mrrtimous function of ti 








BE i ¥, aly js Je rt Ty - dr(P (1). (0.2.12) 





lat b Dayle >) be the wfmotor of sit + Cheecnad elenonte of the matrisz 
Kp) © tg" = 2. (92023) 








wT 7a 
Z@ can bo sham that, for all Pe as 


D,,(P) 
,(B) sateen oily oop BH (t52.30) 
Ze 
Sinne y B,4{P) 45 & sua of products involving elements of P, WA 4g a 
ontiraons bounded function of P on de and the integral (4.2.12) exists. 
then FCP |Y Jet, a fandly of Suaridaae contammous in YT, continuity 
of BE i #214 1 follows from Thooren 2.4.3. Qe8.D. 




















Theoren 4.2.2 ean be proved umler the weaker comi.tion that the set 
af monercedile transition probatility matrices in B is a set of measure 
vars, but, this oriterion is more difficult to apply an practice than that 
of contindty on the boundary. There are meny problems, however, dn uiilch 
Ae At Ae S$ neosesary te agcign positive prob)ebiUity to ergodic natriess en the 
x got For example, im some vandon wall models F 13 inom: 
. dacobh asia le Reo with probability one af [42 4| > 2. 
Msteivation to the N x 3 gens 
Ath rou of F consists of the elenents Fs 4» Thys Hi Aas Tate 
tectique can be applied to eny ergedic transition probability mateiz in 
wilo — alenents are inom to be seros 















Ve now establich that, if Rr) 40 


afined by equation (4.4.1), 









Singer C391. fids reel? was eprarently first discovered by 
hee an 199% in his deoterel thesis (Rumanian). Cf. Rosenblatt [2%]. 
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chere T(t) = CHC+)o see (19) do the axpected value of 7. Tho 
only avauxption which 4s mide about KPlt), the prior dietritution 

lon of P, d@ that F(P|'t) 20 contizaous on the boundary of Bs in 
order to establish (4.2.15) 4¢ must Airet be chown that, for any fizz 
net? = ay 2 TCE) uniformly in P on By Tis ie the content 
of the followlng tap lemmas. 


Jama 42:3 For coms Limexl ae(0,1) let 
A (F}e aed ed YP he Beg. (4.2.16) 





be fimotion of Pon J+ Then (2) 40 contamons on J. 
Brood. let c>O be given. If must te sham that, for any fixed 


Be ps there exists a > 0 such that |A (2) © 4 (Q)| <¢  whensver 
Qe | ,& psd Pe | <8. let 
a 6h SG a 





“Poy >0 (.2.17} 


RB pUlest clenent of P not equal (ascumings for 
the monent, that such an clexent existe). Choseb = “EN [¢, ¢¥/21. 
hen, for ed, af ||P = QI > hav 
then, for any ged. » af ||P - QI] <o we have 
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i raat be pr Payrod umes the weaker condition that the sot ef 
rx Bie 30 matrices in #y 43 4 set Of Meas 2ETOs using the bourses 
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= 5S) dntersdtiva features of the convergenes pf ta 74 (2) Cm B 

md doses not require a knowledge of measure reat 
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ssunel, for the mommt, to be nonempty. Then, using (2.2.17) axl the 
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(253) 5 4) 


in, ie 4 4 i tas} , (45.2.24) Smni§os that (Byy)eS 43° axl, therefors, 
i at Ps = Pa Tins, 
[A (P) on A (a) = Poy o ap, £8 < Ge (4.2022) 
Sagpese now that there is no amillest element of P not eqnal to Ps 3 
eat: 
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Pay i 1B) - (402626) 











for all Pe fy". fivig will be done by chowlng that the sequence of 
Ruotiens +e (n) . Cet Za bounded ty a sequence of functions of P 


asi haw cect 4 Pas nr. Define the functions 
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Ezaki: let « > 0 be given. for any ae(0»i)» 


Mr) ~ mt) < < /P (a) WE ar(P |r) + if ae 27,0?) ante | he 
fa-~Bu (252035) 
2 . quation (4.2.6) ami let F (p|+) be the 
nh al distsibution function of f° Than, noting that Pa (a) 7<2)| 2h, 
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m2 a set Slat) aatiofying eqation (4.2.5) aida that F(P {t+} 46 contimeovs 
on S{a*>. In equation (4.2.35) let a < at be chosen euch that 
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Theoren 4.2.5 stow thet we om approminate T,(+) by BF +) using 
equation (4.1.2). A recuraive progran wae written to carn out this 
approxinstion when P ins a mitrix beta distribution.” Sono saple 
computations of £ [2° \%) are displayed in Tobles 4.2.%0%.2.3. Wo have 
ghown at the base of each table the parancter 7 of the prior distribution 
at ? "gtx the HOATg Be of this distsitution. The natrix v(P) WHLGS appears 
fa the table has as 4te (4,4)th clement the prior varkanco of fay 
Liver [38] has conjeotured that JE(+) can be approximated s 
well by 7 ¢ Ps the steadp-state probability vector correspomiing to 
PUNY» tho nosm of the prfor distribution. This appresimat 
“= QA with each table. All work was performed on an IE 7094 anumter. 
in Table 4.2.1, wheres e 2 2 2 teansitéen mtrix 40 considorcd, 4% is 
a thet ir Ace Ao cbteired with three=pllace accureay for n= 5 crx) 
oth ee acsureay for n= &, this instance, a) 
monotomleally to 7 ()« The total time reaiirad to compute the igh’ 

tales of Tahile 4.2.1 was 0.70 miruten. 
fn Tobie 4.2.2, 222 2 teonaition matrix is treeted which has pricp 
wastences vitich ere lasger than those of the mtriz considered in Tobie 
33 - in tis instance o £ of BY (7H) to =Tttn) 3s mic: clove 
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O.9KS 0.06586 
0.86357 0.13683 


0.93970 0.06630 
0.92027 0.07973 


Io Fase 0.06703 
0.93396  0.0660% 


f 293296 0.06704 
0.932% 0.06756 


0.93293 0.06707 
0.93283 0.06717 


les ar 
0.93290 0.06726 


0293293 0.06707 
0.93293 0.06797 
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Mm = {24.105 9.988 Poo 6.9358 0.0656 
ratee? 0 mr = [0.86357 0043649 


=P) = (0.92955 0.07086) viP) = [0.0038 0.0023 
| as 0.0048 0.0048 
Computation Tine: 0.70 mimtes. 
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0126786 0.36824 


0.35185 0.35694 
et 7971 


20385 5.168 


18.265 22102 
1.005 7042 


0.13502 0.29257 


Bg = (0.75236 0.08658 
0.20225 O0.77kkb 


E(p) = (0.92697 0437084 


0.0063 ett 


9.0085 0.0163. 
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aBGe 
aml ie mot romivwde. For ns 13, Been) ond arian Bemee only 4: the 
f3ret decinel place. The 13 onteries of (hia table teck a total of 5.75 



















Artes to compata. 

Seme sample computetions for a three=atate process are shown in Gable 
Setede Five minutes were reogired] to ampuite the Mrat eight exvtries in 
this ease. The computation of 5 (?’\an) roqgived 3.26 mimtes. Convergence 
ie diow onl ie not monoteric. 

We ranark that the comnatation time of Cap t= 
With p and Mnearly with 0, 
- 











7 ft? 4g @ problen of some difficulty. Te obtein om emilictt formila 

for TA) in terms of / 48 even mre Giiewlts Uris general problas 

haa mot yet been solved. Siiver [38], ascuing 3 matrix beta distatiution 

for 2 bao calontated 7,(27) for various tere, M, using tonto 

Carlo tecmiques. He has aleo shown that, for a toestate chain with 

ene roy of P tmowm with certainty aml a beta distedtution on the other 
soted value of T, dis @ Gaussian hypescoometric fonction. 

Tid 9 pecult 49 generalised in Seetion 8.5, where a serics expansion of 

7 ned when the 2 2 2 ranion matrix P hos the matrix bots 

‘coo nothed of epee. WT) 4a to use the orpedic ¢ 

the dast ecotion. A more general basics for the oloulation of Tr } 

















Tasoren 4.2.6 If P has the dstritation function FP |e Hy 
where <' ig closed under eonsemtive sampling and ie contimpus on the 
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houmlary of 4 then the expeatations Th) simultaneously satiety 
the functional ecuations 
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TT gf TY) 2 i THC Tegl 1}) Beal Wd — ae B (4.2.'a) 















together with 
8 
= @?Wtlpyjet. te £ (4.2.86) 
jet J 


Romar. The condition (4.2405) is necessary te insure 6 unique 
“solution to (4.2.4Ga) sinco, 1f (+) satisfies (4.24400), qiL(+) also 
satisfies (4.2.40a) for all real mmbers o. Sven with this additional 
constraint we have been unable to prove that & Ll] is the wide 
solution to (4.2.40), although wo conjectare that this is true. 

‘Beek For jal, oe, Nand Te LZ, using (4.2.1a) and Lema 2.3.2, 





R —_ 
Tl Real , D .ts9 
= net TCH, 4k TB 4 cy do (4,224) 


q 
wich as (2.2.00). Summing (4,2. 33) OVEP 3 yields (4.2.10) The 
integ rales involved exist by virtuo of 4.2.2 and the contiruity ef 
-_ @ 
arp) on fy. antun. 





let the veotor fmmotion, H(m 1) © Cary (ny) —_ Wyle TD)s 


= Ho | *) | 

Tyla eth) © F Fylms 4.4649) BesCPD (8.2.82) 
i, eoeg 
w31,2,3, eee 
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tegether with a terminal funetion 7 (0,1) = C7 (Ost ds eeen Tk Oe ee 





ofS~a 
witlich satisfies the osnditions 
O< 7, (0, t) <i fad, eeog UH (5.2.24) 
ret 


E Te (Oyt) = &. re P (4.2.43) 
‘The function B,4(1) 4s the (kJ)th element of the expected value of I” 
when 8(P|1) 49 the prior distribation function of P. 

re UO Fn, +) existe, then this Limit satisfies eqistion (4.2.40a). 
n general, Cla, 1) 2 F (a T) does not equal unity and, therefor 
mete Cn, ) need not equal unity. Fowewer, if =a, win, t) oxtiets 

i h is) ory win, t)/C{n, 1) exists and satisfies equation (4.2.40a) ar) 
(4.2.40). ary conditions for the convergence of TE (mn, +) have not 
yet been founds a sufficient cormiition 4s given by the following theoren. 

























enren 42.7 Let P have the prior distritution function F(2\4)e % « 
ally of distributions closed under eansescutive saxpling 
eontiminus on the tounlary ef v4 ye Lat Tn, t) be defined uy equation 


TOs 1 3 Pye pF. 3 Ri (2.2015) 








wie mo p ad (Deo eeog Dd 43 a etechastic vyeoter. Thetis, for Ash, seep No 
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MS. y(t) antota ond de equal to 5 [W¥4\¥]. Moreover, 
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2 Ma Tat) zai. t ¢ £ (8.26885) 


{ai 
Ma = ey 
n->07 (ng) satisfies epmtion (4.2.40). 


imei. The theores is proved by showing that 
Wns) @ z Py uCee JPly coos NH (4.2.86) 
i=? | 
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aSQo 
frem which 4% follsus, by Thecren 4.2.5, that 
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e § Cw le). §@lg soon B (4.2.47) 
Equation (be 20th) is cetabliahed Snductively. For rei, Rr) - Py git) 
ard (4.2.46) holds. Assuzo it 4s true for ne Them, using emattion (4.1.20), 
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Tylomts H&E Milas Heyl 4D) Bey (4) 


ee | s(n) aa 
a & py BEDE, C49) Bat? 


~ = e ar. (15.2043) 


i 
proving the assertion. Sumsing (4.2.46) over j, we have = (nt) = 2 
| gi 


N ¥ 
tR Cr) e fi og? anit) 21. (14.2.59) 
Ras pi ; | 

n retlay , 





By letting Py, ® B45, for some fined Andex kp, equation (4.2.16) 


— e(n) | 
‘ag > = @oe belo 
TT (2p ? Beg (+), ae We (402059) 
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amd 4% 15 seem that the approximation of - Wal Y) by Y (+) is a epected 
so of the math ' guoceseive approximetions defined by (4.2.22) cxvi 

(4. 2etet) « 

Arether approximation of interest 49 based npon (4.2.42) and the 

. funetion T(0,) defined by 








“H0~ 
HOt s mR» Gel, coor H (662058) 
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where ICR Y)) 4s the steadpestate probatility vecter corresponding to 
pt ¥), the maan of the distribution fimetion F(P \Y). We have not been 
able to prove converzence ef 7 (nm, 1) 4n tris case, but Mnited 
ecanpatational experience with tides apprasimation, using a nateiz beta 
prior distribution, suggests that convergence dees occur ami, in some 
‘eases, 10 more rapid than the convergence of a Ct). 

Some merical regults based on the reairsive programing of (4.2.42) 
‘with the terminal fimotions (4.2.51) ave dloplayed 4n Tables 4. 2.tielt.2.6. 
‘A matelx bote prior distribution was uz 
given An Appemiix D. 

A tao state 49 coneidered in Table 4.2.4. The transition 
matrix has the same prier distritation as was used to commute Table 
&2.1, where 4t was seen that Ti(%) = (0.93093 0.06797). In Table 
Balt the approximation 77 (n,W%) defined by (4.2.42) 4a given in column 
to and the nomalieing constant, C{n,%) = 7(myY) + elm M) ds 
given in colum three, In colum four it is seen that. 

.: nay) las) = T(M), wath threeplace accuracy on the 

a pat Steration and four-place accuracy on the second iteration. The 
@icht enterica of Table 4.2.4 required 0.62 mimtes of coamtation timo 

en an ITt! 7994 machine. 

In Table 4.2.8 « 22 2 trensition matrix is treated whitoh has the 
sane pricy Motribution as the matrix condidered in Teble 4.2.2. ‘This 

4s a relatively loosa prior Hotribution aml 4¢ 46 seen that convergence 
of Seay Flnyp%) A alow, although comparimen vith Table 4.2.2 
iixdeatec that this approzimtion hes aneller error than the approximation 






























et in all cases. Tho program ia 
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(0.93278 0.06722) 
(0.93209 0.06711) 
(0.93292 0.66708) 
(9.93292 0.06708) 
(0.93292 0.06703) 
(0.93292 0.06705) 
(9.93293 0.06707) 
(0.93293 0.06707) 


(0.93302 0.06724) 
(0.93312 0.06713) 
(0.93313 0.06730) 
(0.93312 0.06709) 
(0.93310 0.06709) 
(0.93308 0.06709) 
(0.93307 0.06708) 
(6.93306 0.06708) 


























WM = fibios 0. = pane 0.06536 
| 214.3983 34367 0.86357 0.13643 


HE) = [pstoae  Ocuke 


Computation Tame: 0.62 minutes. 
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Gores 





Fla) otn,4n) c in Gods ) (1991) 
(0.43377 063815) 1.07292 (0.40467 0.59533) 
(5.48823 0.62777) 4.07200 (0.81839 0.58564) 
(O.bkaN7  0,60%93) 2.06860 (0.43585 0456004) 
(0.85883 0.61940) 4.06383 {9.41775  0.5822%) 
(6.48308 0.64632) £05920 (0.0182 9.58376) 
(0.88213 0.62347) 1.05528 (0.88895 0.98105) 
(0.44085 6.61682) 4.05270 (0.41922 0.58079) 
(0.63992 0.68958) 4 08390 (0.82957 0.56043) 
(0.43890 0.60679) 4.04569 (0.21972 0.80028) 

(0.89808 «= s«O . 6052.2) 1.04319 (0.81994 0.58006) 
(0.835725 0.60370) 1.02095 (0.82005 0.57995) 
(0.83656 9.60238) 1.0389% (0.42020 0.57909) 

| (0.43588 0.6012%) 1.03712 (9.82028 0.57972) 
(0.83529 0.60018) 1.03547 (0.42038 0.97962) 
MN « Bes 0.352 Pe ere 0. rt 
}O.647 1.120 0.35952i 0.64879 





vB) = [0.1586 0.1586 


Computation Tams 2 5.03 eirmates Ca a3 is eeog 42) 


25.03 minmntes (n = 13, 2%). 
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f at (ns 7%) {ns a) 


= "3 ie ae —— Soe we eae o Se es ee ee ee es 2 pa Ree tts cay Sree: et Ay At 


(0.32342 0.572% 06.3058} 4.00133 (0.32797 0.37285 (1.085) 
(0.32915 0.37255 0.90142) 1.00202 | (0.92899 0.97070 0.™m08s) 
(9.32963 0.37885 0.30100) 1.00298 (6.32882 0.37093 6.30020) 
, | (0.32990 - 0.97960 0.30123) | 2.00973 | (0.32900 6.37059 6.90083) 
—§ | (0.39007 0.97878 «0.30240) | 1.00288 | (0.32912 0.37068 0.2002) 


wo fp Me 





6 | (0.33087 0.97162 0.30887) | 1.00296 | (0.329% 0.37092 ae 
? | (0.323 0.37164 0.30112) | 2.00299 | (6.329% 0.9705 6.30902) 


8 (0.39025 0.37960 0.30149) | 2.00208 | (0.92027 0.37099 0.50022) 


9 | (0.3302 0.37160 0.30880) | 00296 | (0.52929 0.370 06.2021) 


QM = (38.265 2.802 


, 20920 
2.05 5.368 1 ‘| 


Pa 3.752% 0.08658 G-16206 


Vi?) © 10.0074 0.003% 0.0053) 
0 00063 9) 0222 Q oG291 
0.0085 0.0161 0.0100 
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oFie 
hac emaller error than the appromimation nm. the Mret trelve 
iterations required a totei of 5.03 minutes, walle iteration thirteen 
ak fourteen congimed 25.03 mirmtes, S2inetrating the oxpsnential crowth 
of the ocnputation time with n. 
A323 tenncitéien mateiz thich has the sane pmlor distribution 
 @3 Was used dn coupting Table 4.2.3 48 considered an Table 4.2.5. In. 





tis CASO, (ng j Tn.) ILIV Be 
ay Te-place eooupaay is edrievad on the fret iteration, with 
tirséeplace acomwagy om the this iteration. The compmtaticn tine for 











Ay) Ph Tp) Wpere 4) LySeby eves H (0.2.50 
be the oxneste ** "s vhen J bas the distributien funeticn 

FC gyn. T¢ M(B IY) 4e contimous on the tountery of /» ‘Theorem 4.2.2 
Smplies the existence of tho integral (4.2.52). If met )e%, & fesiis 

ef stributions continuus in ‘Y, then Theoran 2.4.3 implies that WT, gr) 
$9 8 continous finction of T. when @ 4s also closed unler the 
eensecative eampling rule, the following theoren dhows that “TT, yt? oan 

be aperoxinated by E Tey [ri 











Theores 2.8 If the prior distribution function of F is KP\t)< 0. 
a fently of distelbations eontirags on the boundary of of, eidoh i 
@loged urler consecutive sampling, then 
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4 1*3 a T(t). (4.2.53) 
AsPol, $22, eeeg 8 
Tek 


Pmem. Let e> 0 be given. By Lea 4.2.4, fo (n) UAC (2) 17,2) 
uniformly on a for any ce{0,1). Arguing as in the proof of Theorem 
8.2.5, wo may choogo n amd v sufficiently large ami a> 0 suffictentiy 


Je Cie Pay 1+) = Fag] <ee (4462.94) 





proving the theorem. QED. 





Thearen 4.2.9 If tho prior distribution function ef P is MP| tT Je a 5 
a festily of dotritutions contimus on the boundary of , which 49 
elosed urier consceutive cempling, then the product moment TT, (+) 
satisfies the following functional equations: 
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(740729). (462655) 


> no oN. 
Teddy) 2 © Ff p.(¥) ra) kis 
rine, mie: PTB, 4g Beg OFT Tie Bg Tg 


Legal, eseg N 
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. : TT, A) & i. rer (4.2.55) 
ani go 4 


arse The condition (4.2.55>) 4s necessary to insure a unique 
| eIutica te tho fumotional equation (4.2.55a). Sufficient conditiom tor a 
unique solution have not yet bean found. 
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equation (422.55a) follews from Lemma 2.3.2. Bquatdon (4.2.55b) Lollows 
by cumning (%.2.52) over 2 amd je QeEsDe 













Given Ty lt de Tig(h J, ant Wt )s the covariance between un 


and TT, ie computed from the relation 


covliiys Hite M(t) = mC mF). (42057) 
4,$ei, eeeg N 
Tee 





3 Zepeated Diesounted Lowers Vector. 

Consider a Naricov chain which is operated indefinitely unter a Mixed 
poldey with initial etate 4. If P = P, let V,(p) be the conditional 

exp. sctation of the total iecounted reward tormmed over en Infinite peried 
when the chain starts in state 1 and let V(E) = (V,(B)s -oee V(B)) bo 
the vector ef expected dlesounted rewards. Howard” has shown that, for 
ary Fe g.s including periedic and miltiple-chain transition matrices, 


| {3 (n) 


wipe eS 6 os Fe op ee, (85344) 
Je r= jot ot 15 ER Te 

isi, eoeg R 
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Vihen Ec is a ramion natrix v(P) is a random vector. In this section 
the near: ond the varlance-covaranoe matsiz of v@®) ave studied end 
expressions for elenents of these mocnditional expects 

ns for the expected value of V,(P) 4s derived 
‘which ie closely related to equation (3.1.5), which was discussed in 
commectiin with the discounted adaptive control prebler. This velation 19 


[ 215 p82. 
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usod t obtain a method of sucesseive approximations for the numerical 

eelculation of the emeoted value of v, (P). Seno erasples are prosentocd 
3 

in Sestion 4.3.3. 















12 of v(P). Let P have the prior distribution 


&.3.4 Beneoted Volne 
funetion F(P|Y) and let 


BtYy)a |v, (pjarcely 4e ceo N (e308 
it APACE |Y) Shy wey (40302) 


» exposted value of V,(P). The first theores shows that this 
ation exiets axd provides eo formils for ¥(+) in terms of the 
Sescted restep transition probabilities, Bt d> when the prier 
distsibuticn belengs te a family cleeed ic consecutive sampling. A 
prelininary leam is required. 





lemma 4.3.4 If ¥,(P) is defined ty the infinite corles (41.3.2) with 
Sap <2, then the serics converges uniformly in Pp srl ¥, {® £6 continnsun 
se, 3 4 (401, noes Ne 
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for 0< (<4, = w%(npP) converges uniformly to V,(P) ono, and 
Wy (2) 49 a contimacus funetion ef P on 4s QED. 













Theorsn 4x12 Lot P have the prior dletritution function PPT). 
Then the anectation Wt) defined by equation (3.2) exists. If 
PE (tHe 7» @ fexkly of Gstributions closed under the consecutive 
sampling rule, then 
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vcr) RE: 8 é 2 Bi (Ty CTD) By. 4) Pat (8.9.7) 
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0<6<1 


‘where Bag (Tae )) 4s defined ty equation (4.1.4) 


the bounded finetion ¥,(2) on 7. Since the infinite series (43.2) 
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in Section 4.4.2 wo shell discuss apprommations to ¥(1) which are 
based on equation (4.3.7). The results of the following paragraph provide 
a different basis for commtation of Yer de | 
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y, ( Y}, tho maximus emected discounted reward discussed in connection 
wath the adaptive control prohies. 





Theor 4.9.9 If B hee tho prior distributisn Amotion RCP \ eg 
a ; feniily of Hetribations closed under consecutive sampling, then V(\) = 





heel 
i @ 
Hr) & Br) +B Ae By (1) V5, CHD). (146349) 
Jak. eeeg a 
ret 
OsP<1i 





where 6)» the expected one-step transition reward when in state 4, is 





Lotting 
: 2 eDeh 
hg coos Ii $e eS 
a, (2) ae = Puan’ # L 9 (103436) 
=~ ta | 


equation (4.3.2) ean be written 
v, (P) : SE : : i 
. a : * He hy; 3 | } 3 P 
YW fone) n 8 gf 
a ¢ It Pp £ : = n) 


= P 2 P)e BEL, cog H (4.3.11) 
a (EH Bo ey Pam’a'h? Ped : 


Tus, changing the intex of summation ani using Lemma 2.3.2, 


* a B 
Ws g(r + 8B ! Dy gv (Bar(e |) 


N 
=4,(7) + 8 pas B6t) V3, 010 (4.308) 
Bebe De 


#109e 
Equation (4.3.9) has the sano form: as equation (9.2.5) ami may be 
anterpretad as a disaounted odeptive control equation in which there is 
qmetly ono alternative in eanh state. The results of Sections 3.2 and 3.3 
apply anc are simmariced in the following theoren wiich ds stated withont. 
proof, since the proofs in the more ganeral oases of many alternatives in 
each state are given in Chapter 3. 














: usoren 4.3.4 There exists a umloane bounded vector function, 
a) = TAS er weeg v ite wiitoh satisfies eaustion (4.3.9). Ler 
} 229 of reeks Aca) PS Ini, esep Ny b9 Gofi.ned ty tas 








“a . 
v (ard, T) = ah) + £6 me R(t) Ve (its % (417). (4.32492) 
ink, even N 


eiplpese 
er 
| 0<8<% 
q (0, Vv) a VW C+). isi, eoagii (265 08350) 
2 4 vet 





Pryce 5 peovaded the terminal fimetion ¥, +} are boumxioed, 
Wasa | Zzv 42%, coop N (Moo Xl} 
ye > 





YO) & Vy bie eeFg N €43.63045) 


dag vos rank At va Thy NY -™ aed g Jpent r= tod Past 


than the error of the nth appromimant 
(at) © Y(t) = Welt). (03.46) 





hes the bourd 





oi) 
k (n 12) < 8° C max esie dss wee ae (3.5.47% 
“4 &p = § a g ee * Ca See FG 
Seth, coon N 
weO,1, geee 
ref 
O<A<4 
Ss Sarthormore, 
Ve py <r, (503088) 


then 3%, (a t)$ is a mnotene Increaat 
the bows! (4.3.17) boaones 


g soquence with limit vgn aor 








0<G,(m+)< BC gle v*). (4.3049) 
Sai, eeeg 8 
MO elagegves 
ve F 
O<8<f 
mi we pad By >. R (5.320) 










then 7 ing TD is a monotone desroaz 





ing sequence with Mintt ¥, (+) onl 


p@ F or = v3 < @, (ns +) <= Oe 421, woes (2365523) 
FO plg2geee 
@ 
Q<8<i 


wer various policies in a txestate Harov chain with te alternatives 
in each state vhen F has a matyiz beta Getribition. The diecoun’: factor 


49 620.2. The reward mtris and the prior disteiution are the care ag 


COCR EE: RT 
—_ 


seo Append B for the program Misting. 











Bs G.20 


©1030 





.* V(r M1) Ol re DW) A(n) 
; 31595 SE we in 
: 31594 0.082 pas 
; 3580 3.988 < 
>) a 
& 9-994 0.003 0.019 
5 9.933 9.004 0.008% 
6 a 0.004 
7 A 0.000 
8 9-998 9.000 
9 pe 0.000 
Poli: (41). W(0542) wa Bet 

5.596 
Commatation time: 0.56 simites. 








n Ving) 
9 20.253 
13.278 
3 10.253 
13.276 
2 £0 £470 
23.586 
3 40.507 
13,652 
hy £0. 546 
13.665 
5 40.537 
6 40.538 
43.663 
7 20.548 
13.668 
8 20.548 
43.658 
8 26.528 
23.663 


Policy : (1,2). 


oi }}- 





Be 0.2, 


7 


W0,%) = [0.259 
a) = [30-882 


0.078 
0.016 
0.903 
0.004 
0.000 
0.000 


4) eS00 








Poiiay: 


Ys 77) 


$616 
573 


$636 
5473 


50585 
5029 


50575 
50287 


50 57% 
5 oleh 


Se No 
5,244 


Jeo 
5.584 


5059 


ASIN 


50573 
5.444 


5.573 
S.h8h 


(253). 


oh Othe 


=) 043 
af 099 


=§ £043 
059 


0.042 
9.035 


~0.005 
9.003 


~~) 004 
6.060 


0.090 
0.000 


0.600 
6.000 


0.000 
0.000 


9,809 


0 C00 


Hoa = Gag 


Conprtation thze: 0.55 230 





e(np Mt) 


0.000 


373 


— — ae a 





@3) S 


n Vir %) Olt» 1) A(n) 


re art ne RT ied eth a a eo 





0 6.042 0.042 13.958 
12-708 0.389 

| GeOB2 9.042 20192 
42.707 02390 

z 5.988 #§ 12 0.558 
13.043 0.0K 

b 6.000 0.000 0.022 
25.095 6.002 
13.097 9.066 

6 6.000 3.006 6.002 
13.057 0.060 

6.000 0 oO 0 GOO 
13.097 5 O00 

g 6.000 ©.000 0.000 
13.007 0 000 

9 6.000 0.000 0.000 
43.097 0.080 

Follay: (252). ¥(0,47) = [ 6.042 

ee 42.708 


Conny tation time: 
8 3 0.2. 
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thsse use! In axymtineg Tables 3.5.1 ¢ 3.5.9 for the adaptive oontrel 
problen (seo equations (9.5.4) = (3.50%)}. For each of the four possible 
pmolickes, T, the terminal 





fEntions are gaven by — 


Vg (Oo™) = Vy(PCD))s 1pt,2 
the expevied discounted reward starting from state 4 when PCs) u P(Z2). 










Ateration vhen accuracy to the thiml decimal plice 4s desired. ths 
couputation time imiicated at the ex of cach table to the total tine 
required tc calculate all rine iterations en an IBi 7094 computes 
Also displayed an ench table 459 the error vootor,. 
Blo) 2 V9)! = Vinh), 
Val90 2) 
taking 1%) = W9,97). ‘The lest column of esch table contains valuc: 


A (a) = pe [naz {ee on Ve y a oa ye 

arror tound of equation (4.3.27). 

aloliations with these of Tables 3.5.1 = 3.5.3, 4% is 
ge m2 that. 419 rye NS orample, tho adaptive contros prebilern an tha peohlen 
Al poliay witlch maximises ¥, (DO both have the 
dria. 4oithel poliey and the same total expected reward. 


















imie this seotion with o 
a for tha ne of ¥ A) el V. 4{B)- This equation 4ncelscs 
8 of the form 5 rey fn, thich can bo computed with the aid of 
m l.4630 Approxinations to cov [¥,(B), ¥ 4B 4} are considered 
in Seoti.on 4.4.2. 





al hJeo 
Treoren 4.3.5 If Shas the distribution function RP |e , & 
family of distributiens clesed wiler conssoutive sampling, then the 
covariances between ¥, (2) and ¥ (8) 4s given by 









ae as 6 ~ et N 
cov [V5 (Ps V2 113 a = = B £ Batt) 


¥ 
Gy yk ymad ak" By 


gH | a CDT Bt CFD) = BMC chy Bee ct Bt 


todo? oee9 B (4.3.22) 
Ye |& 
0<8<1 


Pmok. The expected value of the product vy (P) v (8) is 


EtYU® v@lt1 = { EE wylmg) wlvP P(e \Y)» (8443625) 
. me0 vet) 


al 


w, (pP) s p* : : nit a. {03 RN (8,9,2%) 
7. ‘ | > eney @ JOe 
F om yet 2 “akTok” eo 


gigcgeee 
Ped 
A ae & 4d wale ire then 
| w, (mpP) wy(v9P)| < (Rn)? gry, i, ji, eoey I (8.9525) 
- = Rly, pegeeco 
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(ye se pm Oe Bed — (43,80) 
mm «wh 





the double sum in the integrand of (4.3.23) converges uniformly” to 


c Pp v 362? ond 1 TUNG s 





ae oo 68, N (mn) (29) 

ety. (@v@lrje rn ess f pp Pans Pop (BT 

+e m= ue) Geyskanet “ Paes Pate? ges Many 's 
of (13,3.27) 
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(4.9.29) 
from (4.2.28), cquation (4.3.22) io obteined. Q.5.D. 
Theoren ee rg P| de Te a fanily ef Asteibations contimous 


¥ C4) ana soul, (P),v Abe: are contimonus fiunstions of 
Y on . (Le FPLeccey Ne 

Emof. lama 4.3.4 tnplies that the bouried fmetion ¥, (?) 4s 

rable on 4 Tis, by Theorem 2.l.3, ¥, (+) and 5 CV, rly 
> contirmous on "Ye, QeBeD. 


Coneider a Maxtov chain operating indefinitely under s fixed policy. 
The com tional expested reward per transition, given that gs a 
i.e trancition matrix, is 








N H 
(? L ve i, (? , verdmok, 





wm ds Imown as the gain of the process. when § ic a randon mstele wlth 
the distribution fimetion F(P('’), whioh 1s asamed to be contin 
oumary of uf 5? then g(P) 48 a randen variable. The mean ami variance 


of gtP) are investigated in this section, assuxing thet R(P\Y) belengs 





Bayt hay Fay 10 Me, SOMES | ocr C21. 


2 © 2 2 | ey y ~ pe — fd @ 3). 
BAS vt) : as oh B Ty ptt Bayt ey Fe Pha Tay IB, | hy 
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aidG= 
to o fantly of distributions clesed under the eensecutive gaupling rule. 

















&4.1 Hoan and Yertance of (3). Let the expected valine of g(P) be 
ary= f e(prar(e (4. (Hl.2) 
zh 





equation (4.4.2) shows thet e(®) 4s eontinvusus and bounded, henee, integrable 
“ Oo 8 
on 4 . IE FCPIY) 4a contimeus on the boundary of 4s then Lome 
i & 
(4.2.1 implies the existence of the integral (4.4.2). 


*e re Yelk if if has the distribution function F(P |+)< 7 a 
maly of distributions eontinueus on the boundary of 4 waiich is clesed 
under consecutive sanpling, then the expected gain, e(4), 48 given by 
. Ho oN 
aT) = SE ryt EMRE) vy s Tet (4.4.9) 
is] Jat 

where 1, (4) 18 defined by equation (4.2.33). 
| Broof. By equation (4.4.1), 

my H OF 

8 =r fr fa Pip ar(P(t). ly lot) 

) ds te 1) 12, (PIV) ( 
nN 

Application of Lemma 2.32 yholda (ltelb.3). QeielDe 


Theores 4.4.2 If P hae the distribution function R(P| Hew, Q 
fendly of distribations continveus on the boundary of &. witheh is close? 
ander Racebeusive sampling, then: the variance of af? B) dz 

var [e(F 2) yj 


F ary wn: 2 Dr, , , wat han My 40) ae Ten B OTD Pg (Ty IT + 
re & (4.4.5) 





where “iT, 0) 4s defined by equation (4.2.52). 
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Prank» The wean square of g{P) de, using Lemma ?.5.2 and equation (oP. 
<3 


fy 
2 0s | 
Eg (P)| y,js z = Pen i TH Tg SP PCP IY) 
a 


Lede 


: 
” sa doieomea "AS" nas Aen — Wy MAE | HL Ty GO 
be 


. 
Tater ack Fe Puna Pen 4 OT SN fn. 


(5.%.6) 

The oxistence of ae P 13 follews from Lerma 4.2.1 and the continuity 
ee - r= 

of the bounded funetion wr, (P) 1, (P)P, .B ond, » From (48.3), the 


Square neon of e(P) is 


* i 

: = re P q «4 

Cec} rate Tatas TBs CHF, ( yg TO C4). 
(23.1477) 


Subtracting (4.%.7) fra (.2.6)s equation (444225) is obtained. O.f.). 
gion Bebe} If Phas the diatritation function F(P\1)e%, 


Pes . of distribetions eontimous on the boundary of 4, which is 


tinuces funetions of ¥ on xs 
The theeren follows immediately from Theorem 2.0.3. CeLe!}. 


eovariance matrix of the discounted reward voeter WB) Ciscussed in 
Seation &.3, Wa assumes throughout that tho prior distribution functien 





111. 
of Pas F{ BP It Je Ww, a femkiy ef Aistrimtiens contirmous on the toundary 
of 4 wich is clesed under consecutive sempling. 
The expected valu of V,(P) 1s ¥,({), which ts given by equation 
(4.3.7) in terms of the mean nestop transition “it gene a C1). 


cince 28 BCH) © TC Hs wo oan rentave BTW) wy FACT) tn (04.3.7 


for all n larger than some integer n to obtain ie fee 
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~<a greed HON 
‘ =(n) 
=P : ES ae (HP, Mrs, ae ee D a (5, (Hr 


wo | el ijk hai ee | 
or, using (4.4.3) 
Give Fo EE Mer cbt GOD. (Hate 
4 = 8 beh Be 5 P43 he Pee YP « * Ta & « (4.4.5 | 
nade dede ee 
ie = pigeso 
G<A<E 


The error incurred when the approximation (4.4.9) ie used is 


s 


a0 i 
atat)2 Zp ns BEG OD) = TTS CNB 1s 


nen+$ fat ko i Be 
(44680040) 
ist, sess 3 
ee ae pees 
O<£B< 1 
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s(n) o Wil? 04.44 
By Te CHD) = C0 e610] 5 ts ( ) 
4,jei, eeen fi 
TO 1 2y 000 
Yet 


a(n", +) can be bounded by 
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| £ai, sees 
 DO—l,Zeeee 
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a(*)2 ry p C+ de Zn seog 
ts ar O ge Je ae © 








Se the mean one-step transition reward. 

Tho bound (4.4.12) 49 conservatives tighter bowxis require a tighter 
Bourdon a aA Y)| than that privided by (1.0.11). This 4s 
& problen for future invyootigation. 

The covariance between ¥,® end vB) is given by equation (4.3.22). 


By Theoran 4.2.8, 
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~-— ace + Js Weel) hg coon BR CHT 





For all n > n° ami v > vw let us use the approximations 

— (vy) = ot a 

8 BESS | Tay Mate 1 ID & Teg May (Bay (4 2) (atte) 

Be (Tal YB GST, C19) 8 Hay PIIECT CF) 
(ale 5) 


im equation (4.3.22). Using (4.4.5), we then have 


we (V,(2), VIP }¢] 2  & il ba (+) 
ev (0. (2), VIDIt1s s =a pp z o. (bre 
ss 5s re yal) Gyypkyaet ale my 
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‘ar (sti. SeJete veda tt (15405046) 


The error inwived in using (4.4.26) to appr 
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(3.0.47) 

eneervative bound on oy (avs 1) 4s 
in? v's 1) ae ts ch | BCT (+ +] - 
eT CEB” agysltoumt Fox * Pry Bay’ (TEI) +P 17) 
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Abali.on Thoepas. We conclude this chapter with a theerem which 
relates g(t) to AS Ox Some rosults from the theory of summation of 


divergent series are required and are summerized here without proof. 
ler {ans ad ays Goo Bas eee be a sequence of real musbers. iat 


aX. ‘ ; ; , 
| we ese othe bD 
2 ta uri 2 — Ly 3 Pe Py a (4 fy £9) 








qual. to t, then the sequence ) a is gaid 
to be fonaromrywapahilas or Groumaables, to t- if the Cy-om of 3 6 _ 
+e etme id a ae end aca oat 
Ben = ap” existe aml is equal to ¢t. 


Thora Yah Let P have the distribution function FP \1 eH, a 
faxkly of Gletritneions coutiznous on the boaniary of #, ‘uhtieh de close! 
under the consecutive sampling rule. Let ¥ Ase. c q we and ef {-) be 





. See, for examplo, Kropp [27]. 











m4 bee 
define ty aquatdens (4.3.2) amt (4.4.2), respectively. Then 
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1282 (403) ¥, (Bot) = elt). AOL, cecg No (0lde29) 
te 


Exsof. We first shew that the Gyro of the sequence Dae i 
ewiste ani is equal %& ren for Pi, eoag | and any te. Let 


ts(aet) © shy = ace). Lagat seve H (liatect) 


wags <p ee 


Let @¢>0 be given. Choese n euoh that, for fixed indises 4 and j em 
fined Ye » ) 

#(n) oe < & e I, 62897 
be (1) Wty oe n> Hh (4.4.82) 


al o 


ty ginot) = TCH < aby os CK) = Cty 
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(4408.23) 
Ry choosing an integer i n such that, if n>KX , 
cube = [nt¥ (2 the Bs 
ar Be Cr) = Fry] <B> a 
we have; for n ' 
Ita (n,) +) | Ze (e259 


asd, therefore, oa. % 4h YY) s 7 Y)> proving the assertion. 
Using equation (4.307) » 


§ 
, (4-8 g(Botjie = D. ( Vr 
J 


W 
B a(n) 
= ) = T 
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Ceosumbility of {Oy ce to TCTs, ({°)) amplies that 


= eB Ber ()) catate and as equat to Tit, (1). 


Bos (FP) 3 


frns, using equation (4.4.3), 
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a> a (48) pp an Bo z res MPa, ( Fs. 


2 atv). a K (4,%,277) 
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CHAPTER 5 
TESMENAL CONTROL PROBLEMS 








fein with alternatives in which there is on exmlicit sampling east. 
Tides leads to a distinction between seupling the process ami usin: the 
ses9- When the process is samgied the sequence of states occupied iy 
the Markov chain during the sampling peried ie made Imew to the deci ion 
‘meker, who then uses this information to update his pricr distribution 
on e. During the sespling period the prosess earns transition ramrda 
‘a6 specified by Q avd campling costa are inowsred. 
Om the other ham, Af the progess 4a nag) over a perled of n 
transitions, 4¢ earns tvansition rewards, but tho dedisten-mcker 4a 
ermtted to know only the imitiol state ami the final state of the sazplle 
seqience. Tho only sample egst incurred is that Sor observing the stato 
it 40 reasonable te axpest that, after a finite anount of sanplinc, 
the price Usteitutien of © will te mfficiently tight that the best 
esurse of action for the deadielonemaker will be to esase sampling and 



















sections we sha that auch ea terainal decision point cocara with 
probability ons in an optinal eampling strategy. 

ferminal eontrol models are applicable, in general, to any Meriter chain 
with alternatives 4n which rewards are earned indepandently of the decision: 





2k f- 
maker ’s kmouwleics of the seqrence ef states eocunied by the system ar) in 
twiieh 4% ia pocsible for him fo detemiine the state of the ayatem at any 
time, for a noreseso cost. A specific example of ouch a process in 8 
Fesizov chain model of concumer braxi-owltcting behavior, where a survey 
mest, bo made to determine the current state of the market. 


















wiew by Wetherill (40). A similar problem uith Martzoy 
red hy Bhat [9]. 

In Ssation 5.1 wo auutine Medel. I, a discounted temmirmi control mdoel 
dn which the decision-maker mst sample at every transition of the process 
sion peint is reached. This medel 46 formilated as 
4 set of function equations and 12 1s show that a terminal desiton 
point 4a reached with peobabiili¢y cne in on optimal sampling steatery. 

x is sivum, in Sestion 5.2, that thers arkstse a wique solution te those 
equations onl a method of soosesive approximations da introduce’. This 
sede is generaliged in Seation 5.3, where Model II 40 introduced. fisdel 
model in whieh the deciaion-« 
rocess untal a texminal decision fe made. 























Concider a Markey chain with alternatives whiies has the rami matels 
R ss Cr gl At each transition the ceclalon=naker can ehther camo the 
on : L podikay umder which the arsten 4s to be 

















@118~ 
operated imiefiritely. iet a, > 0 be the cost of ebserving the system an? 
finding it in atate 4 (4m, coos Ni). The gost of any sampling stratesy 
4s, therefore, a random variable befere 4t 45 ameuted. Ansuming that the 
interval between transitions is constant, we may use this interval es the 
wait of time. Let 8 bo the procent valne of a unit reward received cne 
wit of time in the future (0<6<1). We shall seek a sampling stratecy 
wiieh maximises the expected totel disoninted reward over an infinite 
peried. 

When the degision-naker choosea to sample we clearly have a case of 
consecutive saspling. Tims, it 4s ascumed that the prior distribution 
funetion of ? &9 ACe|jeR » & family of dtstribations closed 
under consecitive sampling. Let (4,1) demote the genoralised state of the 
eyaten (imi, seep N3 ‘te XE) and let v(+) be the onpremm of the 
expected total discounted reward over en infinite peried af the systes 
starts from the gemeralised state (4,1). in Theorem 5.2.1 2¢ wil be 
shoun that an optins] sampling strategy exists and, therefore, that v1) 
expectel discounted reward over en infinite pericd. 

If, when in state (4,1), 1% 4s deaided to sample the kth altorastive 
emi the qyatea thon makes a tranaition to state j, the supramn of the 
posterlor expected reward is 














” Medel I 4s ee aie, 
the cost of observing a transition from state 1 to estate j wer 3” 
the kth alternative in oF ae i zn _— ps — (5.1.3) below oc 


Frye f at) 


Hedel Ii, however, requires Sat the eampling cost he indoperic 
stete fron which the transition originated. 
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oO 





Ste 

the probalhiikey of the sample cutesme J, unconditional with regpoct to tho 
price distribution, piven that the 4c in state (4, +) and alternative 
k ie in use, is the prior expect 
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ea valine ef 
© Pg 4° 


Ayre J ey axely. (50222) 
Let acy) be the mean onesstep a. oe reward defined by equatdon (3e3./3) 
excl let ' 
oir i si = 4.¢ 
Erde TBC) oy (5.4.9) 


be the expected cost of sampling alternative k when An the state (4,‘b). 
Then, 4£ 1% is deckded to sanpl: 








a 3% CP) = BCE) + 8 3 BE) ¥ (CHY 
<aqh ooeg 8 (5.464) 


Muppode, on the other haml, it is decided te cease sampling 
eperate the precess iniefinktcly under ths policy S. Let 





Tie e= PUPP (BIT) ——Amky woey B (5465) 
y, + 6 ¥ 
na ex 
meonsitional. expested disesunted revard over on in@inite perio! 
whan tie = Fae end the syeten starte fron (4, 1). Thea, 4° 





sastirsas exists in (5.1.6) Fhe SE 
Using equations (5.4.4) axl (5.1.6), wo have the followlng cot of 





~_ 
(fe Sextion 4,3. 





@{2G0 
functional equations witich mast be satisfied by the veoter fimctlen, 
ul 1) & Cv, ( tT)» eeeg Vial 1) 












¥,(7) = TAX , 


N 
Fae «ait 
tea, LOL) = eH) + 8 EL Bede it cony 


£e i Wi 






Z es | 


Gh, coog ($e8-7) 
tek 
O<A<2 
It 4s to bo moted that the save qyntc?, (1) » 28 ised tn emation 
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Shenae Solel Lot v(t sd) be the expected total discounted 
Gz Modell I when the ayaten starts in the generalfieed state (4, 1) and tho 
sampling strategy deD, 4a used. Let 
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Considers the adaptive esntred problem ef Seetion 3.1. Let 
D, be tho eet of all possible sampling etrategies, d, in the adaptive 
rol problen vhea the system starts fem slete i. gm Medel ZT, i 
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ded, » it 49 clear that d is a possible strategy in the adaptive eonte) 
nrotlens hence, = D, (25 eeey BJ). Suppasa ded, « Then & either 
preseribes a fixed policy, S., for use om every transition mu > o Lox 
sane intercp ns or else not. In the first case, d fie clearly a peasitihe 
strategy for Model I. In the secon! ease, d ia aleo a possible stratecy 
for Medel I, sig.. 06 strategy in which = terminal decision point 4s 
never reached uniter some er all posaivle scuple histerkes. Timo, D < Dy 
(Gm, therefore, xo D,. The proof of Theorean 3.1.4 fe valid for en 
arbitrary reward structare, provided the? the reward por transitéen is 
bourded; thus, tho ramvinder of the pree? of Theorem 5.1.1 fellows tha 


of of Theorem 3.4.1 ard will mot be duplicated here. Q.5.D. 





















that, with probability one, a temminel deciation point 4s 
feaghed in Model I 4f an optimal compiling strategy 4s used. Lot @ 
demste the trus state of ratares GQ 4s ansumed to be pocktive, as 
defined by equation (2.3.26). 


Wo now oho 








Theoren GakeZ In Hodoh Ip 4 the tue otate of nature, @, fe o 
positive matrix and 4f tho torminal fmettions V,(Cyh) are continous 
an V (Sel, scoop Np Oech), than, with probability one, a termine) decision 
poGnt fe reached 4n om optimal sampling 
Emot. Tho preof 4s ty contraitetion, Assume there is an eptian) 
: tery in which a ternimel decielen fo never mate, ‘Than the 
Process 12 sampled infinktaly eften under 
ard at least ons state, i, 45 entered infinitely often. Sines at Least 
one eltornative, , must be ussd infinitely eften when in otate 4, Lexum 
2.37 sexi the posktivity ef Q imply that every atate 4s entered infinites 
often, ami, therefere, thet at least ono alternative in each atato is 
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Gominated by cther alternatives after a Murits number of transitions 22% 
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oun be alininmated fron further curmsidoration. Thus, wo may assiec, Wi Soon 



















jones af genoraiity, thet all olternatives are sampled infinitely oftior. 
By Theoren 2.3.8, the mses of tho posterkor distribution of @ Roenic 
vith probebility one, te eoneentrate at @ ag ny tho mumbor of tunel 
foes to infinity. That is, for amy ¢ > OG, if Po ie defined by equation 
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the lini? holdine with nit ene, Let RCe\T } be the distethetios 
Panetien which places the unit maga of probability on Q. Then, with 
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Sorreizskoer 1.5 eortsin shout tho transition prebatdtities and les te 
Maeskiveli [14] bas shown thet an eptiuc?. 
Bteatesy exists for (5.1.42) in whieh o Misted polfer, Sick, ds ace 

every teangitden. Sowurd [22] has show thet the expected rewami uv: 

tntg stretecy ie, in the notation of this pewof, 
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Therefore, with probebality ono, a terminal ceaieion point 46 reached 
after a finite ramber of transitions. QheDe 
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Let the seqrence of vector funotions, no) » where 
Bat) = (vals ts coos v Ans +)), ve defined ty tho equations 
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Using equation (9.2.4), 1% 4s shown in this section that there cmists 
& unique bounded solution to equation (5.1.7). Equation (5.2.1) can then 
be used as a computational t201 te appromimate thie unique solution ar, 
With this epplication dn aind, a teuml on the errer, e, (n, +) 13 
v1) = ¥, (me 1)» 4s derived, With the aid of this bound, wo show that 
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Lema S21 IR = noe 4 || and ¢ = T fet thon the 
functions ¥, (ms t) defined by (5.2.1) have the bound 
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Proof. By equations (4.3.3) and (4.3.6), 
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Assume that (5.2.2) holds for mn. Then 
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fined by equation (5.2.1)> 
ge sognence (40%, secon Ng te } om 
iin v(m, 1) exists end as a solution to (5.4.7). 

Proof. We show immotively that 1¥, (my mi % &3 a monotono inoreasing 
Since v, (0,1) = ky (xt we have v,(iot) = v, (0,1). 
Assume that v 4 (Re ¥) > vq (rede ) for 42:4, ooon Rami teL. If 
mime ¥) = SE SFC, 10}, then v,(mrtst) > vylms+). Suppo2e thats 
fer some ke a. ons Ky ks 
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ineercn 5.2,3 There la a unique tounded solution to equation (5.4.7). 
Pranf. I% was shown in Theerem 5.2.3, that there 1s at least ene 

townded solution, gC)» to (501-7). Aesumo that w(t) & (wal). soe wl T)) 
4s ales a toumied coluticn. Let 


BR 
si(yyory 1) tr) « patti) +p 2 ie ev, a, C1)). (5.2.9) 


BOlicccee, & 
Sed, csceght 
rere 





Asoume (4,1) 48 fixed. ‘There ere four cassa. 
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Cage f+, Tor some imiiees a and b belonging to {2 soos 5, 
w(t) = Siwy % +) and v(t) 2 Sp teins Y). Then 
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where v,(n,1) 28 defined by (5.2.1) axa v,(1) 4s the unique bounded 
solution of (5.1.7). Let R ami vr be defined by equation (3.3.3). Then 


©, (nyt) has the bounds 
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Emate fy Thesven 5ele2s {vy (ee 12% is &@ wonotone 4nores 
with the Minit v1 )3 henae, a (nyt) > 0 (n0¢%,2,...). The ronainder 
of the inequality (5.2.47) 4s proved by indwotion. 
We firet asteblish that +, (4 )< 77 since V.(o.,1)<R & Pim ote 
ab lat 4 —~ Tey 4 Nom = = $08 


v, (3 r)< aS? Jel, ecog F {5.2.18) 
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ami (5.2.17) holds for ne 6. Assume the equation ie valid for n. ‘Then, 
erguing as in the proof of Thearen 5.2.9, there is an index ke Php eee, Ra 
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pallory 5.2.5 If x(n, Y) is defined by equation (5.2.4) and 
wt): 4s the unique bounded oolnution of equation (5.1.7), the 
4 e(mpt—> xf) uniformly in Y. 
Prof. Sines the error bound (5.2.47) 4s Andopemdent of , the 
corollary follows from Theorem SeZelbe. QelieDe 





Theare 5.2.6 If the prior distrikatZon function of & is 
RCP |Ve 4-5 a fomtly of Gtatritmtions continous in Y whieh de closed 
myler consccutive sampling, then x({), the unique tomrled solution of 
(504.7), 46 3 continous fumetion of YT. 
Prof. Since H 4s contimous in T, ¥ ALL Y)« re? (Ts rt 
novus fimetion of 1. Mowao ver, sec) 48 santdmous. Tm, 
by dmtaction, v, (ns 1) 4e continucas for U1, .cog H aml mO,ty2p0c0. 
Sines Ae uf > wy (VY) uniferaly in Ys WY) 4s gontisinus 
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eampling 4n ame states while coutinuing to eample in others. Fer exile, 
42 the marginal peler dietedtation of bs is Ieceo, witle the narpine!) 
prior Cieteitution of the renaining (Kel) rows of P is tight, 1¢ my bo 
profitehle to sample only when the syoten is in state 4. Medel IZ eaaits 
this additional option. 

As in Section 5.1) let v,(1') be the supranm of the expected total 
Hogounted reward over an infinite pevied when the system starts from theo 

alised state (4,17). It ia assumed that the declsiommaker can 
sample the syaten, aan use the eysten over a perled of mn transitions, or 
cen make a terminal decision. if the aysten 40 sampled the consemtive 

line mile is operat{ive and Af the systen 4a used the vestep sampling 
Tale is operative. Tims, wo shall assume that the prier distelintion 
function of © 40 HC\tde Hy a famtly of dketritation: closed under 
the wetep compiling rule. Theorem 2.3.4 implies that Qt is the mizai 
extenaion of a family of dlateibations which 4s elese’ urder the 
consecative sampling rule aml, therefore, $+ 4s also elesed under 
gonseaitive sampling by Theoren 2.3.3. The, if tho decdislonemaker Ys An 
the state (44) and ewoses either to sample or te ugsa the pmoogs, the 
mooterior Gatibation of g mil be a member of Fi. 

Ef it 20 dedided to sample when in state (14,1), the supramm of the 
prior expected 
hamd, 1% 1a decided % use tho preesss wrler policy TD fer n> i transitions. 
Tho probability that the systes whil be cbhserved in state 41, uncondlticnal 
with regan to tho pefor distritution of ¢ 9 Gaver that tho syaten starts 





























mare is given br equation (5.4.4). Sippese, on the other 
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in the generalised state (4,4) ar] that n tronaitfene are to be observes 
mxier the policy DU. is 
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te & 
the (4,j)th elazent of the prier sxpected n-step transition probability; 
matriz under pliar T. Let a CBs +) demte the prior expected 
Giseninted reward earned over n transitions umiler the poliay T when theo 
systen otarts from (4,4). Poth Ry (Ss Y) and Maal can Ero 
digesesed in Section 4.4. Let t, {nce +} demte the paranster of the 
posterior distribution of @ when the eyeten starts from (4,1) and 2 
observed in state § after mn transitdons onder the psidaey J. Tho pricr 
expected reward umler these conditions 4s 
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ami, 1f it Le deaided to uso the aysten, the supremm of the expected 
discounted reward is 
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Finally, if 4% 40 decided t make a terminal deciaion when in the 
lized state (2,4), the sunramm of the expected total Gsoounted 
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We shal, antiaipate Theorem 5.3.4, which estebliches the existence of 
an optimal sanpling strategy for Medel II, and write =e for 
rrO.9,... ‘0 eqration (5.3.9). ‘Then the veotor function a 1) © (v,(i)> 


Ween vt mist satiafy the following fmotional emattian: 
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Wo shell now cemsider come properties of Model If. ‘Those sropertiies, 
for the mat part, parallel these of Hodel I and the prefs are quite 
cimiier ts those ef Section §.2. In the following tem theorems it is 
shown that on optimal sampling strategy exiets for Model Ii and that, in 
en optimes sampling aoteatery, a teraiinal dedicten point 4s reached with 
probebtaity one. We then desonstrats the existence 
solution te equation (5.3.5) axl comeaider a methed of secessive 

exinations, tegother with « bound om the erssr of the nth epprogimant. 

















ebere Sedah = Let wv, ( Ye) bo the emected total diswunted rewrd 
4n Hedel. IE when the systen starts in the generalised state (4,4) and tho 
sapling stratess ded, $s nsed. Let 


co wap , f 
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These there io 4 sanpléne atratesy7 acd, esea that 


Ve(¥) & Age er ah. veeg {5e3-7) 
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: Let D, te the set of all posdible sampling strategie, a, 
4n the adaptive control protien. ‘hen D, — D,. Sappose d ¢ D, « Then 
ad either presaribes a fimed polfoy, &. Lor use on overy tran ticn 
n>n s for somo integer n’, er else not. In either case, d 4a a possitle 


eteatexy for Model II end Dc Dye Tins, D, 9 B, « The 


prvef is analogous w the proef ef Theeras 3.4.4. QB. 





remaircier of the 


Toeeres, 5.3.2 tn Medel Il, if tho tsue state of nature @ isa 5 
positive matrix and if the terminal fmettone ¥,(s-,1) aro continuous in 
T (Amt, oecy Ng Sed), thon, with probability one, a terminal decision 
pein’ 4s rasched in an optimal eamilang strategy. 

Ermak- Assume there is en eptinal sazpling strategy 4n wach o 

al dsaisien £9 never mde. We shall show a contradiction. Lot 

@ demision point In the sasple history be a point in time at whieh the states 
ef the syoten £9 madio known te tho degisiomemakes. Then the assusption Ac 
that there are an infinite mumber of decision paints. There ie at icast 
one state, 1, whieh is cbeorved infinitely often and at leest PoaLaye 
w» ond transition interval, bh, whieh ore used Infinitely often dn state i. 
Lema 2.37 ani the positivity of @ imply that every otate is observed 
infiritely often whth probability one. Sinve a terminal decialen is never 
hore is a farite anteger, M» sich that, if nis a trancition 
interval witich is used 4n the sexpling strategy, then n<“~ . Vor 4f not, 
an iniSrite treonsliZon Interval io used at same stage, which 48 equivalent 
fo & terminal decision. ‘Thus, thers is a finite set ef ordarad paira, 
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(n,S), whore ne L1p2pecegnt and OJ ck, whieh describe the decisions made 
et cach decision point. Wo ray assume, without less of generality, that 
aii members cf this finite set are used infinitely often in the senpling 
strategy, since any pair, (m9), which is used only a finite number of 
times is eventually dominated. The conditiens of Theoren 2.3.9 are 
satisfied, and, therefore, the mass of the posterior distribution of 
tends, with probability one, te concentrate at Q agu, the mumber of 
decision points, goes to infinity. If #(@\1") 1s the distribution 
funetion which places tho unit mass of probability on Q 9 then t= +" 
ag v—> co and equation (5.3.5) oa 
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It wes shown, during the preef of Theorem 5.1.2, that if 
of. 8 % N ele o 2 
wt") = hee, aE) = BCH) +8 EBC eC rt. 
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Tho argument remains valid when the distribation which places the wri.’ 
mass of probability on Q is mt a member of RL . 
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We may construct a new aet ef polidies ag follows. Lot as (Bas ed, 8) 
be @ policy vector, where 2. * (to De 4s a eholee of a transition interval, 
n¢ 1, snagit 9 and a policy, U),ct. If the alternative 6. 4s selested 
in state 1, then the system goes to otate §j with probebility ke el ds 
earning tho expected reward qc” Theil) = B“e,. If S 1s the sot 
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of all possible alternatives a then equation (5.3.10) ean be writtem as 
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Equation (5.3.24) has the sane formal structure as equation (5.1.44) an 
the proof of Theorem 5.1.2. An armament similar to that leading to equation 
(5.2.13) shows the econtradiletion 

v, (1) < = ¥ rateng Ah, soos N (5.3022) 
Ts, with probability one, a teruinucl deciaien point is reached. 49.5.). 

Let us mow consider the existence and umiqueness of solutions to the 
functional equation (5.3.5). let tha sequence of vector funsticns, 
ums Tf) » where xno t) © (v,(Mot)s soos ¥ (ms T))» be defined by the 

following equations: 
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Jemma 5.3.3 IfR = ae yok sll end C = 3 $e, } 5 thon the 
funsticns v, (np 1) defined by (5.3.13) have the bound 
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Eroof. The proof is ty induction. Equation (5.2.4) shows that 
(5.3.44} holds for n=0. Aseums 4¢ holds fern. Since cath) 
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Thus, from (5.3.43a) and the 4nduction hypotheals, we have 
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les(ntty 1 )|< maz |e + pos p Sgtei , 


BAR inp” 9 ) v Re x 
ZIP eSpccen mi +# Ro + BC+ Bs gig $ 


3 
a 
o 8 LY 
o mne EgtyAl, ao, TS, ant {Stel 
2 
. Se, (5.3046) 


Ge Bele 


Rhegren S2%0% If y(n, 7) 48 defined ty equation (5.9.13), then 
7 ¥, (ets 7 y 43 & menctone increasing sequemes (L241, .20, Mi) and 
vals) exists and is a solution of (5.3.5). 





n> co 

















ol Soe 

pmog. The proof that {v,(m,t)} %e monotone increasing is inizctive. 
Clearly, ¥q (ty 1) > v, Os) for isl, ooog BH am fekT. Assume thot 
U (nyt) > ¥, (nel YT}. If, for some oc andl somo integer v, 








v, (Re 1) e Cop ¥) + 8” 5 Ser. +) Cv,(redy T. (uns t))90,] 


(503087) 
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Ne 
v, (nel Tov, (net) > 2. By (GY) (v,(rek, (vacot dow, (mob ety g(vecet))] 


= Ge (5.3248) 
Ee ¥, (rf) = Tee i, (p ry} 9 then ¥, (athe > V, (8s Ys etei, using 
(5.2.8), 2 RAV, $n all PASO, 
v, (ards 1) > V, (ny T)> ite B (5.3.49) 





roving momstemiaity. By Leeme 6.3.3, the sequence {Ble +i 30 bounded 
ant, thoref>re, ian ¥, (es t) exista. Ths, the damit eatisfies ($2345) 


Ti-pow 


foliaws ay letting H2co in (5e3eh3a)0 QeleDe 


The rensining theorems, the preefs of which parallel very clocely 
these of correspomiing theorens in Segtion 5.2, are stated without proof. 


Ticeren 5.3.5 There ig a unique bounded elution to eqetion (5.3.5). 


ares 5.3.6 Lert the error of the uth approxinant, U, (as 1) 





Oat) e v1) =v, (met). (503023) 


JOR, sesg 
WOole2s 00 
Tee 


whore AT) = (C1 De eves v1) 48 the unique teunted eolntion of 





03, tf 
equation (5.3.5) axt g(n, 1) 4e defined by (5.9.43). Then a, (net) has 


O<qint)< ee, Gly veep NW (503624) 
grgcprece 
Vet 
Os P< 


there R and r ere defined ty (3.323). 


Sorsllary S.%o? If yins 1) 4s defined by equation (5.3.3) ani 
x‘) 4s the unique bounded solution of (5.3.5), than {x(n} > x(+? 
agoernly An ie 





Thaare; 5.32.8 If the prior distritutien fmotion ef & &8 
BC Pe $+, a femily of Gletrdtattens contanwus in Y which is closed 
mxier vestep sampling, then x(‘), the uique bounled elution of (5.3.5). 


40 contimmeus in Y. 





The myserdoat colution of Hedel IT dnvelves conaliderah! 
osmmitation than dees the solution of Model I. Mot enky deca eqmation 
(503.23), the oececsive approxima 
gwealuation of more teyme then does the composponding scheme for Model Ff, 
bat the requirement that Q-, the family of prier distritmtlons of Y , 
be clesed unilor vestep sampling 4uplies that 1 is the mixx 
of a fanliy of GLotriimtions 
that the mushor of parameters 
€e Model IZ is larger then that required for solving Model I. the 
akMional scasplexity of Medel IE 40 prebshly wrthedle only fn tho 
Gf 4 prior distribution wich ia Gght on somo rows of © ond Jeoso on 
others ami where tho eost of sampling 45 bighs 

We note that, while the aim of Nedol IZ is to allow the deciofonm malzer 





























30 
to suxple cnly those states in which there are transition protability 
vectors with loose margimil prics distritutions, he dees not have fal). 
eontrol over the fiture states in which the ayetes nay be cbecrved. For 
GmEe, sapnoss 26 io dealred to sasple the syatem only when 4% 4s in state 
4 Then a samiing stratesy must bo cheasen which traies off the axpoctal 
Geoswted earings of the systen ageinst the neal Ser a high probability 
thet the ayaten enters state 4 at coach decision point. 

We renarit, in this eomestion, that a decision te use the process 
whem in state 4 does mt necessarily imnly that the conesmtive sampling 
altemmative is denimeted at fntaure declaion pointe when tho ayseten As 
faimd in state 4. Such dominanss may hold under a sample histery wid: 
redages tho narginal varlenses of the alternative transitien provabhiitios 
im the ith state, tut there 4s certainly m reason to emect this to be 
the ease unier a sequence of observations wich increases the marzinaZ 
tences of asme of these transition protabiiities. 

















ae) Gxt os = a8 fA 
smimate Tneminal Deri sions: 


in Models I anid Tf itis necessary to oveluate axprensions of the 





Lorn 
Wie te rel yooh » (Solna) 
where ¥, (Ge 1) fe the e:pected tatel discounted rewrd earned cver an 
infirate ported umier the policy o- when the ayotem starts from the 
generalised state (4,17). Sines = may contain a large mmber of policies, 
4% 4e Gecireble to Sind methods of solving (5.4.4) for the maximising poligqy 
ZT” vitich avid a direct moron over all alezents of ©. This problan has 
gat been solved, bat gone peolintmary remarks oomearmine the azprustimeticn 
of © are offered in this 


















2 Fhe 

are also appliaahle te the poblen of selecting a plier which mazinices 
the epooted gain, e( dy), which was dleeussad in Segtion 4.4. 

let V. (24) be the conmiitienal expected total discounted reward over 
on infinite perled under plsey 2 when the aystes starts fron state 4 
oni Fo F. the polloy which mxinlecs this reward can be found 
efflolantly by moans of Moward’s poldayr iteratdon algorkthn [22]. It 
wag seen in the proofs of Theovana 5.3.2 and 5.3.2 that, as the mmber of 
ebservations in Medel I or Model TX poes te infinity, the sass of the 
pootertor probability dateiiution of © tenis, with probetAlity eno, to 
mncentrate at the true state of natare. Tus, if © is the moar 
the Mistritution of © , wo can approximate * wy &, where & is 








the momber of observations of the process goes te infirity. We conalder 
here a bound on the error, 
Serre Berry Ochs ty. (508.3) 
iat the polictes, T, bo iesdomed ty 9. Tine, £ ny Sas wees £;4 2 
timers J is the mmbor of distinct policies in ZT. Por a Siucd indox i, 
Let, Pe be partitioned 4mte a set ef J mtually exdlasive and exhaustive 
pabastay 5,» such thits 4¢ Fc3, »» then 











T(r G) = Se ya ; tama 
re H(t) ie the ptor distribution function of GY , lot 
Pag? 2 o/exew ie we J (8.05) 


derate the prior probability that @ 


o Sines tha seta By 
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athOe 


2 
SP (fp ei. tek (5efe6) 
qu 23 


a dally eee Ni 


let ¥, glo T) be the comittional expected value of ¥ (ae G)e given 
that £e3, 


pt 
ay t) s Ratt? | Vp & ae \v}. (5.8.7) 
Se5 Jie ee2og vi 
te ¥ 


We mote that (5.%.%) impidea 


Vj sSye 1) > io T). ots cece 3 (Selde8) 
7 “4 
Let RUT) and e(c-) be tho maemo onl mind trenmcitien rewmerds when the 
policy Le (cs esos v is uped, 


r(g-) cs 1.5 De Let (5,t.9a) 
fle ¢ oh 
eC, S i94 tay be Sed (5.83) 
eng let 2 art 2 be defined by equation (3.3.3). By equation (4.3.4), 
Va(Zsk) 2 = gs 2, 4 Erg @ Py Pee Pg 5 (5.f.10) 
ord 
@ td ys 
G=aB<4¢ 
SS, therefore, 
| Coa & } whe 2 
ta< as < V, (Ss V) = — ice (50% 43} 
Oe 





ikile 


r{o~) R(o > ) 
i-3 = “a4 (oT) = on. = eeg J (S.f.12) 


€ 
op ft 


iema 5.4.8 For Ji, coop dy tho following inequality 4s valid, 


= m = r( co, ) 
(lp Had) < Yay) - a a) (5uth-89) 
=e > d 
O0=6<1 
bret. Since the sete S, j partition pe KX we have, using (5.%.12) 
Wilt) © V Cot Pi glt) + 3 ME ye Py (07 


>V Op Jap _ P(Z,) 
ag ACH + Cow, Cr ED 


(5 ete 22) 
Equation (5.4.13) is a rearrangement of (5.4.18). Q.E.De 


lems 5.4.2 For any poiiay Teh and ang index fs 3 te ovss x4, 
the expected discounted reward under the poligy 0 has the upper bound 


WG 1) <i b) + — hh (rer(z,))- (5-23) 
jrhy e@e9 3 
Xe 
tre 
0= B< i 





Front. Wo have, cing emations (Se eSE}, (5.0.8) » amd Leman 9.!-. 1, 


WGN = HG nyt) + 2 Wyle re C4) 


< Hoo, CH) + CoP, C4) Bp 





ao Ulyibe 


< Was Sye Fy g(t) > (eR, CTD) Ge 


* 1oP, CY) 
< pr) + arse (Ter(G))- (54t4686) 





Theoraa 5.4.3 Let 6,(Cip T) bo defined by 


a a] &@ | 
a (Lay ¥) = V, & » - a (Las v) at ne J € Sethe G7) 
isi, econ H 
tre ¥ 
where T° 4a the maizisine terminal altar defined by equation (52.1). 
Then @, (Cy0F) has the bonmis 


0< apt? < i a a 
ees _ 
are 6 
O<= P< 1 


Emof. By equation (5.4.1), Wiest d= Wiset) and ai ryt) > O- 
The upper half of the Inequality follows from (5.4.15). QEoDe 


in order to bound @ (Lao Y) using equation (5.4.18), 4% is nosessary 
te evaluate Pa gl TD. Thie prebles has mot been conpletely selved, chiofiy 
beceuse there is no satisfactory method of fimling tho toundarl 
eat 344° HOPesver, S44 4s m3 ssariiy a commocted sot, when further 

saplicates the problen. The probebility P, (7) can be estinated ty ucing 
mmerioak or Mento Carlo techniques. 

Ef a(C»F) 4s the gain of a Markov chain with alternatives when 
operated inlofinitely under the policy Te, aml 4f e(T, +) 4a the 
eorresperding expected value when g has the distribution function 
MC\Y), then, es we shell sea in tho next section, it is often necessary 
t. evaluate compressions ef the forn 














@445- 


al He BE Sast) . (508429) 
ig the nean of the distritation A(¢|{) we may wish to approximate 
O° yy e, defined by the expression 

a(S oF ) i= Te el x08 Df « ( §.%.20) 
There are efficient algorithms for the solution of (5.4.20) [46, 22]. Let 
the er:or of the approxinatiean oT Sa be defined as 
AC ,»te aXe * +) @ 2( 29 t)- Le oven U (5.28) 
3 J re & 


Iie} 


ff 


A bound on ef & £ 59 Y) eimiler to that of eqation (5.4.18) is enskliy derived. 
Lot 5° x, be the set of all pocitive K xN genoraliged stochastic 
matrices and let 4 nn be partitioned into J sets, Sq» where, 4f P c5,s 


. ef, €) & Ted Fetxeer} e (Slt. 22) 
Tf KIN) 4s the prier dlstritution fmetion of & , let 
PY) = J ag ee J (5.6.23) 
be the prior probability = © es Sy. Let 
Brot) = SIFY J CLL MMEIT) Sehyrornd (Selbezh) 


Nie ek 
be the conditional expectation of est ) given that a CS 50 Than, by 


(5.4.22). ~ 
64S yo T= B4( Se Y). ag coog a 50025) 


e ae 
Te RCT) and e( 5) are defined by (5.4.9) ani R anid » are defined by 
(3.303) 5 (450.2) Snmies the Snequald tes 
px eC, T)4R o oft (5.6.25) 


mr T)< er) <= R(T). st, neg (50.27) 
ois : = 





osito 
PE athe Seb For jai, eren V 
Bg Saget PAHS Bye t) = (iP. (4 HZ) ( $0l.23) 
jot, seeg A 
He & 





Proof. Using (5.27) » 
alae) = B.S PP CHD + F Bleye TR CH) 


= > Baye 0 T)P PAY) re (foP, (1))e(o))- (5elbe29) 
QeLeDe 


Lama Sxle% For eny poldey gef and any Je (1, +0» JS 


Lot) < Boye t) + CP CTIA). (54-59) 
sal, 2e29 J 
Te 
trek 
Penah. ly (5o!.26)_ (5el.25)5 axed Lemma 5eltelt, 


Zot) = 62s tPCT) + S Ble HAT) 


os Po Kew 
Gal Te FIP AY) > (2 PTR 


€ BCLye 1) + (PLY) L,))» (50832) 





Theorem 5.4.6 The error fimotion Lys ot) defined by equation 
(5.4.22) hes the bourrl 


60<e( Ty T) <= (40P (4° )) (Rolo; )). Ply coop I (504.52) 
3 J tre Ee 


Ercot. The theoren follows SBreoty fron ecmaticons € 556.19) and 
(See D)e QeloDe 








@f)}So 
making demeiona in an adaptive control mdel with ms Gisesunting. theoae 
remarike apply as wold ta undisaomted tominal control medele. One 
ori.tevion we may use 4s to oonaiider the claas of sampling strategies which 
maximise the ameoted stexiestate gain of the prooess, then to choose from 
tis glass the stratesy wiiheh momimices the expected rewrd over the 
transient. period which precedes the terminal decision. This omiterion ic 
made preaise in the present seotion, where Madelg TZE and IV, the ure 
GLcommted analogues ef Modele IT avd TI, are intreduced. Ne analyaies of 
these models has been carried cute 











§.504 Vode Tk. In Model TO 4t is asmmed that the prosees wii) 
be sampled eonsesntively until a terminal declaion point is rea 
thich time a terminal policy is selected and the ayatem is oporated 
ialer this poliey ever a finite temziinal operation peried. 

Let vy, ¢ 4 3¥) be the cupremmm of the expected reward ever a period 
whose terminal operation phase laste for v transitions when the syste: 
starts in stato 4 ami the pelor dsteiintton or Fas n(Elt)e FH, e 
family of Goteritations closed under a:mnsacutive sampling. If, when in 
state (2,'1)» 1% is decided to sa 
the prior expected remind 4s given ty equation (5.1.4) with B = 4. 

Sines wo shall be conoemed with large values of vp, we asoume that, 
caer it 4s decided te cxaese sampling, a terminal polfoy will be salected 
which mazingass the steady-atate gain of the syaten eT, 1). ‘Therefore, 

supramm of the expested raward 





ery oe ? at 








ale at least once more, the supremun of 












BONS velo, Le N ° (505-2) 
cS 








a hbitees 
Tima, umler the assemptions of Model Til, v, ( su) mist cathaty the 
foliosing functions] equations, e 
RAS olf wk alg | 
v(t sv) S max ack<K, 5 aor) = (tr) + - HE Heck (ravi 


we PET) 
40%, co0g N (50502) 
6 oo 
Ud Zo De ece 
ane argunents of Seotiona 5.2 axl 5.2 required that the disssm’s factor 
6 ba Less than unity aml, therefore, are mst direatly applicable to Medel 
Itt. The existeme and properties of solutions te (5.5.2) are matters 


for fatare investigation. 





Equation (5.5.2) yields the supremm 
perked tith termined phase of length v anid ales ylelds a decieion for 
“26 current transition interval. This deciaion, whieh is either the 
selection of an alternative to be sampled or a teminal poliay, will be 
 ealled a yeoptine desision. A veoptimal deaiaion whieh 4s the same for 
aii y safficiently large will be called en optinal decision. Since, for 
largo WV, avery veoptimal deaicion maximises the expoeted gain and also 
meximises tho total reward aver the campline pemlied, 4% 4s sean that cn 
optimal dewiaion, as defined here, satisfies the oriterion set forth at 
wAnring of the seotion. ‘tho oxistence ond nature of optingl decialons 
have mot yet been investigated. 











§.5e2 Hodel Iv. Wo now assume that the decialonemker con sample 
er use the process at any time pricr to tho temiinal decision, independant 
of his past decisions. Let v, ( Yeu) be the supramm of the expected reward 











<a Bi yifee 
over & pacicd with terminal operation phase of lmgth v when the syaten 
starts in state 4 with the prior distritution H(C (1+). I% 46 assumed that 
H(E\Ye H-, a fomlly of etsibations closed under vestep sampling. 
ts of Section 5.3 and of the previous peragrams, 4% 
4s asen that vy, (1 su) mist, satiafy the follewing functional equation, 





v, ( sv) 3 max 


N ai ig 
tksR, Le o A(T) + os 446 Te 46F, C1980) 






8 
Bed... WMG + 2 APH 
Pi bv (2, a(n Te) gv) -c,3} 


met S$ .86c, 1) 
dehy seeg N (50509) 
tet 
VOL 253 y 00s 
The remarks made above concerning weptimal deolaiens ond optimal desiclons 
apply as well to Model IV. 
By approesiing optimal deslelens for the urdiscounted terminal corntsol 
models by means of voptimal decisions wa have anmphacisci the fact that 
an wriiscanted infinite horisen mdei ie an appromnation to a systan 
Wiech wins for o long, tat Ginite, pemied. We con equally well view tho 
Lesconted madel as en enprodimation to a oysten with a disawmt factor 
very clese to unity. Thus, another approach to the solution ef the 
indiseoanted terninal contre] probien is to let § 91 in Models I end FL. 
The exictence ant propertics of sointions ebtelned in this manner and tei 
relation to solntions obtained vie woptimal docisions have not yest bea: 
Irrvegticntei. 














ofiGe 
somites Eroeroses UAT Satclin Goata. 

in tany proossses thich can be mdelled as a Maroy chain with 
altermnativas there 4s a cost asteckiated with changing alternatives in cach 
state. Such 3s soteup cost could Inchude, for example, the est of starting 
the operation of alternative k and of sutting dom altemative j vhen 
in state 4. Seteup costs san anally be introdzesd into Models I ami IZ: 
we ilinstrate hw this isa cone in fodel IT for a Mimed cost, 3, which is 
inciyred for each disnge of alternative made. The mothed is eadliy 
generalised to the case in wich 3 fe a function of the state in wich 
the change is made emi of the alternatives involved in the change, and is 
aise applicable te the adaptive control mxiel of Chapter 3. 

Let Tos CT)o eong ae denote the poliay wrier which the systen 
4s eurrently operating, where a ia the iewviex of the altamative in use 
An the ith state (02, seg Ky). Ye now Cefine the generalized state of 
the systen ao (1, f2g), there 4 is the physiosl state of the systex 
(4%, .o0y NW), ‘b ardexwe the prior Astritatien of & (te %), ani 
SX ia the poldcy currently in use (ter). let v, (1s) be the suprazm 
ef the emected total discounted revard over an infinite perlad Lf the 
syeten starts in the generalised state, (4, 1,9). The prior distritution 
Sanction of (4 is assumed to belong to a family of Gstritutions olesed 
wer consecutave sampling. 

If the systen 20 in state (4, 0,7) and 4t ds decided to sample 
alternative k, the supremum of the expected reward is 

(Bg, 70S + RECT) = BACH) + B : Bes T Cap CT GC) (5-5-8) 
Bd 





















o & & 
22% cong (2) = ( ve eeag TF defined by 











& 1Lre 
Capacat OF apd 


= it, ae | (5.6.2) 
is the new poliay veotor when alternative k is chesen Ain state 43 oe 
1a tho Kronecker delta. the quantities @( 1) and S(1) are defined ty 
equations (3.2.4) ani (5.2.3), respectively. 
RF it is ceakded te esase sumling when the aysten is in state 
(4.1.2) ami operate indefinitely under the mlday 7° = ( Toe coos 7? Jo 
the expected reward 4s 


N a 
S as (Scguae @4} + Yala t)s (52603) 


wheve V,(c.', 1) ie the prior expected dlecounted reward for operating the 
system over an infinite peried under the poldey .¢°, starting from state i, 
when ECVIY) 4s the prior distritution of . 

Tine in fodel I whith a Mwed set-up eost, ¥, (4 6) mst eatishy the 
following Sanctional equation, 


a h ie 
ular tak, 1 Ceoget borgir a=seiter + BE, By gov 6% C49 le)? 





N + | 
Te ‘3 pa fao- oigeh) + ¥, Cr nh | 
$23, coves H (360i) 
Yet 
am:*> 








CHAPTER 6 
DISTRISITION THEORY 


in this chapter wo introdnes some probability eass functions and 
density functions whieh will. be required for the next chapter, there we 
do the priozmposterior and preposterier analysis of a Markov ehein observed 
under the conseontive sampling rule. The Whittle, Whittle-2, and 
Whittle-2 probabliity masa functions ere defined in Section 6.2 and fermilas 
fer their moments ere derived. Tho mitivariate beta density function ia 
eensidered in Section 6.2 and in used to define the uatriz beta density 
finction in Sevtion 6.3. Sone axtencions of the matrix beta distribution 
are considered in Section 6.4 and the chapter concludes with a discussion 
of the bete-Whittle probability wass fumstion. 

The miltivariate beta density fimetion, es defined by equation (6.2.2) 
below, was introduced by Mauldon [%] in 19593 Mosimarmm [31] has studied 
the main propertice of this distritition., Tho matrix beta distritution 
was used by Silver [26], but not under that nano. The Whittle and 
beteWhittle distritutions are oricinal with the present wrk. 


604 Tho thittle Diatetmties 
let %, = (39 E,o sees x) be the sequence of consecutive observations 





ef the states of a Markov ehedin over a period of n transitions, whore 

2, =U £3 the state of the systen prior to the Mirst transition. the 
renge set of the random vartables %, 43 the sot of integers which index 
the states of the chein, {1, +) HY. It fo acguned that the transitions 
are governed by a knam 8 a" stechastle matrix, P= lp, 4) and that the 





#1Sie 
cLetributien of the initial state. TH, 49 a lmowm stochastic row veotor, 
B= (Dye seen Bide 


Given a seiple cutosnie, xs we defins the statistioa f j a3 the muber 
of indices me{0, ty soe9 mel}, ouch that 5 Si anda = J (Apoly co, I). 
In other words, f 3 is the mmber of ocourcnece of a transition from state 
4 to state j in the sample x. Lot Fo (8, Je enN x N matrix, bo the 
Srongition count of the sample. Prior to the observation of xy F ani 


% are ranion quantities whose joint distritution is stadied in thie eeotion. 


f, s 45 i=, eees R (6.4.2) 
= {= econ 6 of 

£ z £ 1 & of ¢ 
oJ {xi 43 : . ( ) 


be the row and soln cuns of P. With the exception of the initial and 
final transitions, evory transition into state 1 in the sample x met be 
foliewed vp a transition out of state 4. Thorefere, the elaients of FE 
are constrained by the ecuations 


f. Sad f. ‘ea ve cod by? 4=2, eoosg (6.16%) 
where u = x, in the irditial state and vex is the final state. 


The followlng lemma shows that, given a treneition count F and en 
imtial state u, the final state of the sanple isa uniquely determined. 
A Similer dorivation shows that, given B end v, i is unieuely determined. 
Lomas 6.1.5, velow, shows that F dees net necessarily uniquely determine 
both u and v. 





ema Getet Let ue {ty oo. NY bo fimed. If F 4s on Nx N matrix 
of mnmnecative integers which eaticskies the equations 


2. — £4 = Pon * Say inf, eeeg N (664.423 








1520 


for some integer ve 2, eoen nt » then v is the only positivo intesor for 
wateh (6.4.4) is truc. 
Exook. ‘the proof is by contradiction. Assume that v and w both 
satiefy (6.1.4). Ifugfv, then 
ty, eV WE w 
By A119 


ehich imnlies that vew. Ifuev then 
¢ o £ a § o & z 0 $21_, ccog 1 
Ua iu 


ie o, 4 
arxd 
f, = Ps & a oe te 432, eoeg il 
3) é ye N 
bs oa eoe 
fu iw” 4 ee 


aedweuey, QsEse 


Lot T= {Opip2y-.04 denote the ect of all norencgative integers. 
For fased uc {45 ose ck 9 VS +t eeeg i 2 n6 {152535000} 9 end Peds 
Gefine the following eet of N x N matrices, F = (2,44 
Pp tg ghteVemsP) = 


n ; 
‘2 I, eZ, ke z fey iS As 7 = f 4 a o, 44 gy? *. 13 =x) if By re Ss rd ooo! he m 
(652.5) 
List : 
Po (t49%9P) m3 \) Pp (Ug otyP). wel, O09 re (622353 
' vad ON o a ee 
Bed, 


&% 49 Gleor that Pi. (s%9P) is the set of ali poenitie transition counts 
E waieh can arise Zrom a sample of nm coensemative transitions in a Hasicov 


a ae 


ehein with teensition matris P eed irdiiel state a. 


6.1.1 The uhittle DistetinMon. The 8 x 8 randon matrix F< (fe 3) 
T.th range sot P (ate?) 48 asaid to have the Whittle distribution with 
ranster (uyn,P) if ¥ has the Joint probebility masa function 








. ame %.! 


es Tt mtd p P) 
Gj (8 tomb) © Fy pgm, Fl Pag 8S keh y(tomed 


= Oy othorulse (6.1.7) 


rod pos 39 eee 
Be S 


The indies v is the unique solution of the equations 


f,. Os ” Say 122, ee09 if 


a re 
a4 


G 3 xe 
and Fn iz the {v,a)jth eofacter of the N x N matrix x 2 02443 defined 
by 


fw, 4° ay 
nn 7? f,>0  — (Gel-Ga) 
as 64° | Z,. 2Q (6.4.83 


Since, in (6e4.7), there may be vome Py 2Q, wa use the convention 6° @ i, 
We have called the maze functhen (6.1.7) the Whibtle dintet 233: 








HOR f roe 
b fen}, gat Pi N 
Ee ¢),aemeb? va iT, f, 8 


Wiittle’s derivation of (6.1.9), ami subsequent preefa of this relation 
by Dewson amd Good [215] ani by Gesdman (21), were obtalne’ under the 








oi Sha 
restelotion £, > 0 (Ueki, ooog No BADAngsley (10), in a partion) 
elegant proof of (6.4.9), did mot remuire this restriction. 





6.1.2 Yanente of the Uitle Date 
for the means, varlences, and covariances of the alenents 
prasenting these results, however, it is necesgary te summerise cortain 
fasts from the theory of matrioce. 

lot F be a Nz i matrix with cigemvalues Age eeey Aue esnumed to be 
Asatinst. Lot g(a) be an ariitrasy scalar polynomial, a, + GK + 200 % af» 
and let g(P) be the corresponding matrix polynealal, 4+ a,B¢ 2. + 2 2 
Syivester’s Theoren states thet, 






: (Ie) 
ge 2 eA. (641.40) 


(te) 


where the fz N natedees A ere defined by the expresaion 


p 
at) tal ‘ head - Le 


(6.2.45) 
Uy adie Ne te he) 
ead, aoe NS 
these matriess have the following propertioss 
Fe a) 2 Oy id4 (6.4A2a) 
ta” ag 3 el ied, @e03 R (6.42128) 


A, fommila cirdlar to (6.4.80), called the oopfineat, farm of Syleetar’: 
iuceren, ie avallabie in the case of repeated eigenvaines. 

ze P 43 en eppadie stochastic motrizeed.o., Af P 30 the transiticn 
mtmin of a single mmepemadiic Markey chaine=then exactly one algemvaluc 
hes tho value unity axl all other eigenvalnes have moduines less than wity. 
We chal adept the convertion thet Ne 49 the unit rcots 














o155> 
oy ax Z (6.2% 03a) 


[ag | <2 402, seeg N (6.4.29) 
Then the mateax a‘*) 4a en 1 x 8 motets each row of uhich de the steady 
state vector ID = (74s 2een 7) defined by the relation J = JZP. 


Theores Gai.2 If the N 2 N rendon vatrix F has the WhattLe 
distribution with parancter (ttytg PD, then the expected value of F 4s 


43 
7 t 
EB (fe, 4) fa En (35%) 4 * gi 3,9 F315 ccegh (6.4 14) 
J leg 43° ate 
Webs ccoeg 


Were Ae is the (2,1) tt element of P « EF, farthermrc, P 48 eppodic 





ami the cleenvelnes, Age ry ho or P are distinct, then the azpected 
valine of f 5 has the gpectral representation 
N 
gap fa) - 
2 nj ny * € 4 eset Getet 
age ) Pm, f mo Th, Oth ® 24,0 ¢ 5) 
@ ees 


WASPS a m2 Ca, 3 49 defined ty (6.2.11). 

Brent: Let f g°tse be the number of trensttions from 4 t2 jis a 
sample of n trensitiens which has initial state un. Frior to ths 
ebeervation of the sammie, ¢ g, _(ton) 4s a vermion variable. If the sy 
stems in state u and the first trensitton le to state kp then £ (aon) 
satiaeSles the equations 





#, s(aen) € a o4 * fi lt tre}, eae . (6.%.450) 
ered 
fi g(a02) ta § wh Beg? sth eveg N (622-368) 
Thus, £, (a sn) satisfies the equations 


aj 





456m 
N e 
Z, 4(ton) oa Pag at ial DP seta steered) MEB2p 30 eae (6.80270) 


£, 4ito4) m& Py sean * (6.%.47b) 


We shall prove inductively that 


f, (u,n)} ac Pd “p is “he ooeg (6.4.48) 
a3 re) a3” i 
UML, coon I 
Sinee SnPy 5 * 6°, 4? (6.1.47) 49 satisfied by (6.1.28) form ne i. Asmme 
(6.1.28) holds for n. Thon, tising (6.4.47a), 
z, ‘ 0 N ref (rs) 
nel) s ¢ £ py 
rs Pagan * oy Pate any Phen "ag 
£3 Pag et + £ Pa Pay 
= oes (601429) 
— e eae 


proving the irdustion. 
Tf all the eigenvalues of P are distinet, Sylvester's Theoren yields 


- HED sdeZgcee (6.2.20) 


If, furthermore, P 490 ergodic and Ay = i ie the only cigenvaine of unit 
modnins, equation (6.1.38) ann be written 


os n=l N le (3) 
“se ey et oo May 
8 4. 38 
ep (nm? er : Ma Ase Jo (6.12%) 
43 5 me? ¢ 2 An Lat 
ig S21, @esg Nj 
HEL yi Bpoce 


QeveDe ueh, eves fi 








wi Se 
hearer 6ein3 If the R= random natriz Phas the Whittle 
distmMbution with parameter (uymyP), them, fer agBpy,b = fg coey Hy tho 
#2 
eoveriance betusen £05 and £8 38,5 
637 (f5, £ ve. = 


el 
£yg(tan)] + * to i 


ed = ¢ Jul \ ~ 
£.giten) LS Pant yo Pee? + Poy. “PyefaA(5 gt) 


ay’gs ~ 
ae (6.42225) 
= en (6, 855 - 8 ama 3° nat (6.42220) 


Lf F ie ergedle and the eigenvalues of Py Ags TTT Aue are ali distinct, 


the asvariance «of fg omc e 8 has the speotral repreacntation 
N rn 
Os oe a Z = (a) 
eav ied, 2 
Ct 5, ys = Pop ay tar, ; ans Zo-,, ua 


R 
Bo) (a) (m) 
PapPys PE, O nl Cir, et Ty a] 
NW Uae ate 
ee (hay te 8dy ale 2) 
m2 jo2 lok, Lek bed 
“ Delon the (m) , (1a) 
* Pi? me F Ae. ie Cr (a. t By ) 
4 n 
ob (m), (a) : tons +(mel) A. (mm) (a). (mn) _¢ 
: ms + a i] v 4 Yeas (a. ? ay a) 
i (2)(5), (8),(3) gy red nel tel 
+e 5 Sue Shy* Say Mo pte MO at 6.2.29) 
mee jel iw J toh, were 
ign J 


WON n> Z. 
root. If £pcaen) 4c tho mmber ef transitions from state a to 
state 6 in o esmple of m transiticns whem the chain starta in state tu, 


emiation (6.4.46) end the relation 545, 4 = 6. im 4p that, 42 the 


wl She 
first transition is fron u te kK, 


Fg(a0n) £ (ton) e 


OrRD 


Bag isp%on® ate * Syn ogee) + OS ghgikenet) + fig kore’) e greeted) » 
TES pSeece (6.1.28) 


o Sees 
ra Soba an te at lien (6.1.2) 
Lot 
Tyger) © E (Fig(usn) %/,(a9n)). (6.1.25) 
Then, using emation (6.4.33), 1% 46 seen that cBryg (oe? eatiefiies the 
equations 
T eofiyg 9h? 3 


Cd bd u e é 
HOS gece (6.1.26) 
a Sy 86° onP aS" read (Behe “5d} 


Wo shall show that 


Tepe se) = 
fel (gpkek) _ -(relele) 
SaySpatap(tem) + ECP Pagtyg (Pele) + Bay Pyahap (Ook) Je 
wi» eeorg 8 (6.1.27a} 
TL, S500 
3 Be Sngtap tes ate eoos BD (6.45270) 


an vitieh cage equation (6.1.22) gollows. 





o1 S- 

Zt is clear from (6.1.17b) that (6.1.27b) equals (6.1.26b). The case 
M22,350e-0 WLI be proven by imiuetion. For m2, it is cashiy verified that 
(6.1.26a) is equal te the expression in (6.141.278). Assume (6.1.27a) 
satisfies (6.1.20) for m- Then, ucing (6.1.14), 


Toa glteDtl) = 6, Go,8..Pig + 6, PastaglOen) + 6 Doe (Bon) 


Bl mi , 
a m) (rete m) 
t EP Sabps & ca, Pap * t "Etat 


(n-lem) 
teey Pl (Bam) + AC Pigéns (8pm) | 


Pants 


= 6. ¥°s6° auPup * 8 Pastas (583) a SyoPusk yo Bend 


ie 2 
a Pan Pap * - "ECan Eg(Bem) + aay PyptaplOom)? 
a Yh cas aS 
(6.4.28) 


proving the inducticn. 
if P is ergodie amd hag distinct eigenvalues, emations (6.1.45) and 
(6.1.20) for the spectral representations of f 4 gf 49h) and p 1 


An (6.2.27) to obtain, for mU2,3,en6 


ean bo sea 


0 n 
fut + © a a”) ) 


cowl fF opt hv © pigtaw, + z = ae KPa Sag = Y gee toa” 
a 


rei % is BH Gon® 
* Pasty FE he eee ceTT + 5 = ab) yea ars 2%, ald?) 


(6.1.29) 
Multiplying out equation (6.1.29) and using the relations 


4 e = Fs) 
. yoy Tete) A Pie, Tae 


re Cs ccog Bh (662590 
tered ni (ink)? Zs . ( 3 } 








21 600 








a (leAg) 9 P25 v8 eg 83 (6.2.33) 
mi at x fared _ ymedy 
= wiamacay € a Me, Se (601-32) 
gad | he «» N, 
jpmee, eoeg 
j#n 
and 
z be) 
neh tenn + (mela 
2: y tedek) (ay ) & = 9 MSEy vecg N (6.1.33) 
oo a te) 


equation (6.1.23) is obtained. Qelel. 


in Theoreme 6.1.2 and 6.1.9 the spectral representations previde an 
efficient methed of commuting the moans, varlanoss, and esvariances of 
eLenents of 7. Taie method le partlewlarly usefal as the parancter n 
becemes larzo and, in fact, Leads to relatively simple approximations for 
£, ,(asn) ard ov (Fgh 6] when mis sufficiently large, as ic shown 
in the Soins corollary. 


Somilarz Gist If tho N x N rondoa matrix F has tho Whittle 
diateibution with parameter (aps?) » thera P 43 ergodic end has distinct 
eigeavaines, then, for large mn, the expected value of eA F and the covariance 
botwoen £5 and & v8 are given ty the following asymptote expressiona: 


al Y .(m) 
£ usn) oo p,, (nm, + Ff a 1s 4445, corp B (6.2.58) 
23 43 4 ge? Pod, 





@i6le 


y (m) (2) 
= + WW. 
nl TePaplayopa > MaPap™yPya * PapPys 2 “ely © Uvlta 
ie). 
FN (x) Lr aly & (rm), oT (ae + Bia) 


* Bopha Sas i mao th,” PapPys *, rs 





;  g(@, (3), fm) aid? (m) (4) 
H Ny Bis. dy » & 
> Fea? S > vy Sia Shy * Say 8a 7 Son Say | (6.1.35) 
YW m2 9u2 (19 ,) (toh) 


ebgiys6 ie i, eeag 8 


Proof. Equations (6.4.34) ami (6.1.35) are obtained ty letiing n 
beseme large in (6.1.45) ami (6.1.23), dropging terma of order 
ne (ups? cog Ny ax] noting that 


- oe 3 nin @ Oe WEE» ense i (624-38) 
Qekiole 


66403 The Writtiont riintion, Let i be a random integer with 
range sot $4, oecg a and let Fe cf gf BO Ay a panden antes wth 
range sot P 6taRoP) The ordered pair io 4a eald to have the Whittle-% 
Alstedbation with parancter (penyP) if (G8) has the join’ probability sass 








Sumektion 
ad (asg | Peleg) pee” cx | sme?) Wg eceg 
. Fe P 1:( 19092) 
oe otherwise (6.4.37) 


WIESEL Pp ®@ (Dy eaag P,) 45 8 stomhastic rew vector, TEL oy Bp seen ery 


Bed N° 


o 162m 
4 
Sines p> > 0 and zp a, Lt ds clear that 
ph | 


ny (usP | Potty?) > 0 


and, using equation (5.4.9), 
: a 2) (a, Fl Pe P> 2 1. (6.2.33) 
ust Fe , (ust?) . ae OU 
It is readily seen that, if (F) has the Whittle-i distribution with 
parazeter (Pete ?), the marginal distribation of T is 
Pfulp}] s Ps UML, covy N 
20, othersisea (6.1.39) 
The marginal distribution of F is eonsidered in the remaining paragrapis 
of this section. 


6.404 Tos Whittle-2 Distribition. Let F be an N x 8 random watriz 
bt: 
with range sot 


i 
Py (nye) © UY P,Cusn,P). PAL,2 pees (60149) 


Tho Whittle=2 distribution with parameter (pyn,P) is defined as the 
norganal distritutien of P WHEN (GF) has the Whittle=4 distribution with 
perancter (Dame?) s 


N 
OR | Pots E) = & fan Fi Doriek) Be P(r?) 


BO, otherwise (6.1.32) 
where Pp 43 an Nedimengional stechastio row vector, ml,2,<0e5 and Pe a e 
It 4s clear from the definition (6.1.42) and the fact that 
Mase | Pos?) 4s a probability mass fanction thet 


| 
Exo CE |pomp)> 0 





— — —_ a a 





#163» 


(i) 
>. £99 (¥ DoMpP) & i. 
& SS w a 
Ee Py, (neB) 
Before deriving an explictt forma fer eh) (Fy Petia?) s & prolinminary 
leaza is required. Te this ord let b (ns) be partitioned into two 
* < 

BIS, P, (me) and P goltok)s defined as 


P 14 (P92) = 1 Rep (nsP), £, Sf, (LOLs cone mt, (6.2442) 


P aol tgP) 3 P.” (na?) = Pg (P92) (6.4 3) 


P,.4(nsP) 4s the set of all traneition counts which start end end in the 
sane@ atate and P, ot) is the set of all other transition counts 42 
Pe (inp). Roth sets aro nomampty. 
Lemna 6.13.% Let the sete P(r?) and P o(nsP) bo defined ty 
equations (6.1.42) ead (6.1.43). £2? Fe $4 (noP) there are exactly N pairs 
of integers, (x7) @ (upd, wats oor, Ny which satisfy the eqiations 


f, °f 


ry Asi, ecepll (6.3 is) 


4 as Je - 7 
3 
Ef, on the other hand, Fe P69?) there is a unique solution, (aay) @ (as). 
WhOFe U gs wT, % (6.43). 
+ | 
Prooe. if Fe F,, (tab) » then 





£, ~ £4, 3 Gs Jal, coon N 
ard (6.1.4) beeones 

rk, s 1%, soon M 
These equations are saticofied by ze yeu (uml, oo, NH) and are not 
gatisficd by any pair (xy) such that = f y. 





Ss - ——_ 





oi Glin 
mm % 

LP Fe P. (rob) there ia, by the definition ef Pn ohtoP)s at lesst 
one solution, (u,v), ta (6.4.48) with u¢v. Assume (up v) also 
satisfies (6.23.44). If a 2 wu, Lanma 6.4.1 implios v Sv. AgsuRG 
usu. Then, if v. fv, sudstitntion of (u,v) in (6.1.44) ylelds, for 
42 Yy, 

fy. - fy m 42 


while (u, v’) substituted inte (6.2.44) gives 


a - fy Ss Jae, § 


a contradiction. If v @ v, then (u,v) substitated into (6.4.45) with 
423 u dnplies 

.. = fy ai 
and, since vag $ Us (n°, wv) substitated inte (6.1.3) yields 


fe ms fa ¥ Sate 


whieh eantradicts the assuzpti.on a fu. Thug, (uy v) 3 (UpV)e Qebele 
Pigorem 6.3.6 Let F bs en N «8 random matrix with rango set 


P (92) wyich has the Whittioge distribution with parancter (ppt?) 
Then the probability mase funetion of ¥ 4s given ty 





ny 
Il f, 
OME | pores) © CE wylg) TT yy ge toe 
1 ek 
is? 4=3 45 
he 1 
Xe ho OR 
RS Anh £5 
T 
42% jai 


be 0» ether’ ce (6. i ofb5) 





@165e 
Wrers Pe is the (x,y}th cofactor of the matriz Fr defined by equation 
(664.8) andy when Fe Pi ktP)s (u,v) 48 the unique solution to equation 
(6.2.4). 
Eran. By definition, P,,(msP) and ,-(npP) aro mtually 
exclusive sste ani together exhaust the range set, a (mgP). If 
Fs $4 (tP) then, by Lemsa 6.1.5, PeF, (hemeP)s 421, woop N and 


ee |Som,P) > 0 ree 
which ylelde the Mest line of equation (6.1.45). If Fe oe (npP), Lemna 
6.1.5 implies thers 19 exaotly cne value of u in the range fi, «sey Bl 
such that 
ecm | agnsP) > 0, 


which yiclds the second line of (6.1.45). QED. 


6.1.5 Hements of the Unittle-? Metritutton. In this per 
derive formulas for the expected value of em 4 ard for the covariance 
between #, and ¢ when F has the Whittle? distribution wth paranote: 
(Dots P). When P 4s ergodic and ps aT, the steady state distribution 
corresponding to P, particularly cimple fonmias result. Related moments 
have beon derived by other authors. Arderson amd’ Geedmen [1], ascuring 
that many Markey chains whieh are governed by the seme transition matrix 
are csimaltenesmecly observed, find exresatione for the means, variances, 
and covariances of a, Aa the rumber ef syatens making a transition fron 
state 1 to state § on the tth transition (4,j22, econ HN). Cod (20) has 
derived formilas for the mean vector ami varlance-osvariance matrix of the 


nem oats Le (EF, 1» ves G5)» where 




















o3hbe 

ww N av 

fy m= ge S43 
4a the mmbor of tines the aystus iz chsorvad to bo in state 1 (including 
the initial state, but not the final etate) in a sample of n consomtive 
transitions, ascuming that the distribation of the initial state 1s JT, 
the steady state distritution corresponding te es Ow emations (6.1.53) 
and (6.1.53), when sumed over j and over B and &, respectively, redias 
te Geod’s formas. 


qui , @O og i 


Theorgn 61.2 Let the K x N random matyiz F have the Whittle-2 


Astribution with parameter (Dene f)- Then the expected value of g, 
4g 


iS (a j= Ae on 2 z, p pit? 4551, eeeD 8 (6.2.86) 


If F is erzccic axl has Giatinot eigenvalues, Age eveag Xe then CF 4] 
has the spectral representation 
B Cg, 4! 8 All say? BR, 2 oF a 28 og » Ap GOs coogi (E0% 0879 
wh 
where the N = N matriags a “3 tay j ave defined by (6.4.21). 
Progt. Since ' 
q 
E tf 4" ea 2 , Pa $i, 4lt0m)s (6.4.08) 


equation (6.1.45) follews demetlately from equation (624-14). When P te 
erpodis wath etinet eigenvalnes, (5.1.47) follows from (6.1.13). GS. 





penran Gsi.5 Lot the N x N rendon matrix F have the Wnittle-2 
Asteitution with varanct 


covariengse between &. am % ie 
ap os 





ae (pstef)« Tens for OgB evs 6 "oe Lo eeen Fly WIe 





* net i (rmtele) 
cov Papofyg] * BF gi, Sag-Elh 91) + RS Cpapt yg (Bole) wey PoP as 


Hi Lol 
WZ — By 000 
eo of na, Sgn BlE De red (604089b) 


i¢ B ia ergodic with distine? eigenvaines, Nes coves ro and if 


pi”) = pa as ni? ) =i n”) 


where A) te defined ty equation (6.4.11), then (6.1.09a) has the spectral 
representation 
coul? 9 fg) = 


p WeBly ceag i (6.2298) 


R sea? (n), eC 
Paphay palin * Ep sat Me | “nats ® B £ a Tey +7b, 3) 








N 6) H 
+ 3 2 de ton, ) pm) 0 ™ we “nr, + : melenh, + Nip fh, as?) 
Erk 2 
( , (2 H Jeornh + (mei )A (m) ¢ } (rm) (m) 
- Tre. a\™) » 3 ros as (b. a3y > by Sag ) 
(2) (3) , 6 6) - mt Mel , met 
» * E a4 ona Ms nS ae gy © (6.1.58) 
me wt Inks tor, Ay Ay 
Emek. Since P 
ECP of} £ pon (sn), (6.3552) 


ep yo" unt @ oBys 





al GS» 
equatien (6.1.49) follews immediately from (6.1.27). fy uoing (6.1.52) 
tegethor with (6.4.50) and the spestral, representation (6.1.29), equation 
(6.2.51) 49 obtained. 62D. 





Comoldarz 6.1.9 Let F be on N x N random matrix which bae the 
Walttloe2 disteliatien with peraseter (77,n,P), whore P is ergodic az 
& 
Jt is ths steady state distribution sorresperdting to Pe Then 
E (#52 5 nD 4» Le Sly coop NH (604053) 
OI, Lor AofigyeS@ 2p coop Hy 


coy (& ap? * #4)3 


(tr) Qe 3 
RT Pipl6, Sag? BILD.) + = E(rmiek)p oP g(TaPa, 7 Tm Pyg 2) 


WI Bs 200 (6.2.54a) 
TWePag' a6” TP ig) ri (60h. 98d) 


L? P has distinet eigenvalues, equation (6.1.54a) has tho epostral, 
ropresentation, 
Sov (€ 9 £6] 3 


s nelend, + 1, (n) (2) 
ITD Pap 'Se,Sas baad T Paya) ~ Pag! 78 i ae ea TA py aa to }. 


BZ Ia ece (643-85) 


p a Ws EEO phe2o eee (644655) 
Ute . 46h, coes H 


davoly from (6.1.46). Using (6.2.55) in 





equation (601.53) Follows ‘ume: 
equation (6.1.09a)>5 


o369e 


cov (fas js 


NT Pago dgg2 7 Dig) 4 Cr Papt (Belt) + 7 Pgkag( Bale) Js 
T1829 a vce (6.1.57a) 


= Taal Sag" MPa) rest (6.32570) 


Reting that, if n> i, 


etic vel ieod (mn) 
az wee} = F = 
tea fy ; eo 8) Pua Pag 


me) keantd 
mt (m) , 
ws Pag eal (neem) Sa r (6.1.58) 


equation (6.1.54) follows fram (6.1.57). 
iz tas distinct eh FerVaLICDs then, oy (6.14.12), 


G@) oo =m 
b 33 a 82, seey RB (604259) 
J Tamas ° §=4 i, woes 


avis An this CBSO, equation (6.2.54) reduces ta (6.4.55). QebeDe 





6.2 Tho ipligveriste Rete Distribnts 

In this sestion we consider the mitivariate beta distritution, wriohk 
is an extension of the beta distribution te N dimensions. Thero arc 
several different generalizations of the bete distribations this particular 
ene is due ts Menlden [90]. ‘ihe moments of this distritution have bean 
derived by Mosimann[32], who also relates the miltivariate bots distribution 
te the gamma dictritution. Sone of Nscimenn’s reeults ero presented 
here for the sake of complotenesas “the proofs, for the mast part, are 
original. 








afF0e 
WATATOMASS Jats Denise Function. fhe rantom steahastis 
veotor, B = (Bye ene Byds 43 aid to have the mitivariete bete distri lution 
with paranester 9 if Pf has the joint density function 
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a, > 0, {325 eeeg H (6.2e2) 
and, 4f J (x) ds tho games function and 
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MW = & Re (622-3) 
aug 4 
the normalising constant is 
B (m) a2 eertin! ee (6s2e) 
‘ nr (m, ) 
§e% 
( 


| i) 
It is to be noted that la) 40 the joint disteitution of Nei of the 


B 
elements of Be the Nth elenent being determine’ by the eonstraint 
N ine 
2 *, a 4 see 
4ecf, Mh 


The follewlne lem provides an alternate representation of the 
normalising constant, Ba ° 
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where B{m,n) 4s the beta function. 
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QebeDeo 
Since B(mn) > OG, 2% 4s clear that 
(3) 
fA (p \2) > 0. Ped, on 
in the next theorea it 46 established that op \ adap aie Xe 


then follows that the miltivariate beta density Ametion, as defined by 
equation (6.2.1), ia a proper density function. 
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Je cpl wap ats (6.2.3) 
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4a the univeriate beta density function and (6.2.8) helds. Assume 
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Let us meke the integrend transformation 
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Thea methed by which Theorem 6.2.2 was proved can be generalized to 
provide an identity uhich will be usefial in subsequent? precfs. 
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6.2.2 The Multivariate Bete Dintritetio: 





If the randoa stochastic vector Phas the multivariate bota 
distrituticn with perameter pathen the multivariate bota disatribation 
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funetion is denoted F 5 CD] de Wheres If p = (pig soon Bde 


rep | my 3 P CB, < E Dyn coos a. <p! 
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. f a fp 4 oa | adage (6.2.48) 


As is the esos with the beta distrilmticn function, there 4s mo closed 
expression fer equation (6.2.38). We aan, however, express F(p | m) 
ao an (hef)-fold infinite am. The case N = 3 is &Uustrated here. 
For notational slmlicity, iet% the parameter veator be 
B® (aphey)o (6.2.19) 
Then 
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The double infinite series of (6.2.27) is F(ieveseBpcti, Btls Pye Po) 
Appell’s second hypengesmetric function of te varieblee [2]. Appell 

has shown that the dsable eerles ef (6.2.27) converges absolutely whenever 
Py + P, <1. Thug, wo have 
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Tho question of convergence of (6.2.27) on the boundary of the region 
Py * Bp < i has not yet been reselved. We mote, however, that equation 
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The nonstendardized multivariate bate distribution is obtained fron 
equation (6.2.4) by making the tranaformation 
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We now show thet the marginal and cerditdonal distritutions of v - the 
clenents of F (v2i;2,cee9 Ne2) are, respectively, miltivariate beta ari 
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Theoren S24 If the remion dimensional etochastic vector PD 
bas the mitivariate bata dioteiintion with parameter mg, then, for 
Vil, cooy Ne2, the marginal distriimtion ef (Bye Boel 6, 4a mltivariate 
deta, 
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Then, by Theorem 6.2.3, the Joint density Ametion ef (q (v), H) 42 
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ami the conditional density function of (Bs «+s. #,) has tho kerne! 

us pt (iebiv) E p,) pi (6.2.45) 
whieh is the kernel of the mnetandardiced miltivariate bata density 
Munetion, 
ead s0) | tabu), A7(y)). eB oDe 


6.2.5 Homent Fommles. The moments of the mitivariate beta 
cdistritution are most easliy derived a3 e special case of the moments of 
the matrix bota distribution, to be eonsidered in Section 6.3. These 
regults are state! here cad proved 4m the next seoticn. 

Let the random stechastic voster pe (By soon §,) have the milfie 
variate beta distribation with parancter m 8 (ys ones Re Then, 4f 
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4) igher moments can be computed from the recurrence relation 
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where the V, are nomnegative integers, « is any irsies such that v, > 06 
end T,(g) is the veetor g with the olezent m, increased by unity. 
In matrix form, (6.2.46) = (6.2.48) can be summarized’ as the moan 


vector, 
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The X 2 rerdom generalised steciustic matrix, ¢ = e te Sa said 
to have the natriz beta distribution with parmeter MH = im Jar & 
tas tho gcint density function 
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nsional vector. The total mubsr of rows of toth & ana % 
43 Rye Zo be quite general, wo act the posaltdlity that K, = © 
for some i. 
The matrix bets Astribution is the joint distritution of K(h-2) 
arion variables, bi The remaining K elements of @ ave determined by 
the relations 
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Inspection of equation (6.3.4) shows that the matrix beta density 
fmeatien is tho product of K mitivariate beta density functions, 
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beta distributions is contimous in 7 . In Theoren 2.2.1 4¢ was shown 
that the matrix beta diatribation 4s tho natural conjugates diatribution 
for a Markey chain which 4s ebserved under the censesative sampling rule. 
Zt then foliews that the family of matriz beta distritutions 4s closed 
umier consecative sampling. This property is used in the followlne theorem 
to derive the momenta ef the matrix beta distritution. 
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The following theoren provides a recursive forma for commting this 
expectation. 
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Proof. ‘The theoren follows immediately by applying Leame 
agiation (6.3.18). QeeEeD. 


Since the miltivariate beta distritution 4s a special casa of the 
nateiss hota distribation 4n which K o 4, wo immediately have the follecdinz 
corollary. 
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have the maltivariate bete dletribation with paramctor m= (m, » Ay Bide 
i 





ol BS 


ELD, J is p, =m wat 8 $f, seog (6.3.20) 
ma, (Mo m,) 
var(p a “horns 45h, soen (6.3.24) 
z aU + 1) 
corti, By] 2 pte : (6.3.22 
pP,1 2 273 oJece) 
aes tH + 1) ‘ 


axel 
etil %"s ln J °B,@) = Hi | a) (6.3.23) 


where tho ty are nonnegative intewers, «a is any index such that Vv > 6, 
and T. (a) is the veotor m with tho alenent a Snereasod by urity. 


Before comsiderine the marginal ard conditional disteibutions of 
gitmatrices of @ , it 4s neoaseary te define the nonstandardized mtrix 
beta disteibution, lot 
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ef se tok |e, >) is the monetandardiced multivariate hota disteitmmiion 
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heares Oste% Let the X x N random generalised atochastic matrix ¢§ 
have the matrix bete distribution with parameter 2H . Them, for P Rly seep K 
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Brora Galtei Lot (2 \in. ~) be the probability density function 
defined by emmation (2.363) ant let %g be the corresponiing extended 
natural conjugate fadlly of distributions. Let C(71,) bo the normalising 
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Ci) = { Bim prem, IBCra get, 928U, ‘b2¢10, )3(m gt tom, tBKa, 9 2) Birt 29m) | wt 
(6.5.9} 


Bguation (6.4.2) then yields the means, 
Blmy+ For) B(asem, 7=28(ery *2ymo)imyttyn,) *B(my +2519) B(n,¢251%,) 


<= 








my) = 
2 B(a q?2s8p) Blast, Jm2B yA pry Bimyehym, ) (my atin) B(ta.+ 2pm, ) 
{6006903 
Bag t2 ama) Engram, )o28(m, ri gr) Bimst2ym, Bling sma) Ringr3om,) 
qt) = _— hair Pert eect Oe gar ne Tae Ree “eer ag seeegt wae Tag : eae kane mC eee 


Bny+2stdp) Blast, m2 m4 IB (mom, 48m, 9f2, p) atm F250, ) 
C6.%.49} 
From (6.4.3) wa obtein the covarhance 
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cov (3 @|¥) = 
seyy | De PemePBkngttany nAolnyPomsdStngt Pony DhnyetansdACRGA IA) _ oy 
B(mg+Fez) (Maat Jo23(mgt2ptin)B(ma>h om, OD y+ 9rg/ Bayt Sym, ) 
(6.3.42) 
and it 1e conn that there 16 non-cero osrrelation between the rows of 1. 

Let (P) se ( 7 (EDs sees HAE) be the steady stats probability veeter 
corresponding to the N x N etechastio matrix P and let ya (Yeo cack v) 
be a vector of nomegative integers. An extended natural conjugate 
distribution for P which is required fer the analysis te be carried cut 
an Chapter 7 is formed by letting - 

az)y)o JU (mgt Bed ° 








| 2 Q, otherwise (604.43) 
let lie Ca, J bs an No 2S materisz of positive olements and let Hy 
demote the ith row of x (427. seep Be Then the WH = 8 pamios stochastic 
matrie P40 cald to have the mateiz betesl distribution with paranster 
(Hy) do " has the Scint probability density gmotion 





(mq) ' 7 8 Hi 7) a 4 
Sigg (BI ior = Now) TU Jt Cag * my Bed.” 
B Op pthormilge (6.4.48) 


vhore D(a.) te defined by equation (6.2.4). ‘The normelising constant 
Wey vd 4e tho veciorooal of Bf i C7 py") when P has the matwis bota 
cistibution with parameter HM, 

, Fe 


y ‘ty (Hs),. sie 
iiQjev) = nn Jt my Me ymar  (6ut35) 
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Wik,v) can be computed using the motheds of Section 4.2, mt this requires 








2191= 
sengthy calculations. 
Ceing 4.2.4, 4% ie easily seen that 


(8) 
vA fega (2 | Moyi@k @ 2. (6062.6) 


the first tem moments of the dietritution arc obtained fron equations 
(6.4.2) and (6.4.33, 


gfe | uy} = “a 3e ws y) Lp J@h, ceop WH (Geltel?) 
a3 Sle wet, Wey 5) e JP, & fi 
n Of WM, y) 
ctrytaly te Twee MD _ 0 
~ M iH wit CT Gi » 
¥ v5 ap me GpPay,S a iy ec5og N 


where i, = 5, M, F and on 5? is the mateiz M with its (4,9)th elesient 
increased by urity. 

Dae to the compiles eaiariaticns required to obtein the normalicins 
eonstant Wov) » the matrix betas? disteiiation is presently of Imitcd 
usofuiness. This distribution is, however, of seme importance since 4% 
£3 the natural conjugate distribution fer one ef the data-generating 


6.5 the Rotertibttie Metriintion. 

The betervhitele distribution is defined te be the unconditional 
distritution of the transition cout F of a Harkov chain with transition 
probability matrix P widen ts drown from a metriz beta dlotriimtion. ‘the 
betarthittle-2? Gsteitmtion ie defined 4n an analogous fashion, Im this 
eestion explicit probability mass fumotions are derived for theses 
distritations and their memente are discussed. 





For fixed mn amd v (u,teh, eoe 1) 
and fiowd n (e721, 25 3p seeds Let 


H 8 
,,(ayv9n) 2 fe £43 eis t por oF eg f « fa ra) on a 6. 
(423, eee9g nyt (6.564) 
and let 
N 
P ayn) G ( ) Pp (u,¥_n)}. uel» eaeg et (6.5.2) 
wag A TRA eda Ipece 


P, (tap) 4s the set of ald possible transition coumts, Fe whieh cam aries 
frem a sample ef n consecutive tranattions in a Maskev chaintw. ti 4mtial 
etate u ermd a positive trenaition probability matrix. 

The bete-Whittle probability macs fimection with parencter (upright) 4g 
defined as 


CE | mark * ‘ ye | womB I CR | EE — Be P Kupn) 
. 

s 0, eleahera (6.5.3) 
wWhare ly cece Ny Mh, 2,3ye00, Ord M 3 te, 4] 4a an M2 8 matrise suas 
that Oy > O (Ap fehy coey MH). 

When F has the betacWiittle diststtution with paraneter (uyn,i) it tc 
Clear that must have the range set ¢ (tists sinee the set of stechactic 
matrices witloh have one or more clementes equal to sero is a set of measure 
gerg relative to tho matrix beta Giotri bution. 

Tt 4s sen fron (6.5.3) thet U(E | usa) > 0. By comparing (645-1) 
and (6.2.5), 4¢ 46 seen that P (un) = > (upmsh), provided P de a 
positive matsisz. Since the set of nenpositive matrices P 46 & set of 
monsuve sero ond sinoe P (ten) 49 @ farkte set, we have 
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» oe (F | Usha) = shane, ae raed 63 | tapmpP P) eer ce| x BAP 
Fe P, (up) by Fe Pysteben 
3 fi, (65564) 





Timi8, the betashittle mass fmmetion 4c a proper prebabllity masa fonet 


7 Ge%el The betaWhittle mess function wth paremeter (timp) 
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ra pe CE ) ten) ee Fe } (ayn) 
sar $n 25g 
S 0, elsewhere (6.5.5) 
where 1, vs) zm, 0 Bimy) te the beta unction, and v te tho wilqus 


sointion of £ equations 
S,. —~ £4 am Pon aad Sy 4&1, a#eop Bi 
mete Lotting R, Comte tha ith row of My 
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~ al aoe a ot. 
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Sat Jnl, ag! (6.526) 


The integrend is the kernel ef s matrix beta density function with paraxeter 
i * By HENGDy Usd £6.2eb) 4 
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e@ (6.567) 
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si 3% 0 fy ay? 
Qe Bebe 


fho moments of the bete-Whittle cdistriimtien are soucwhet omplioated 
to compute. Referring to equations (6.1.24) amd (6.2.27), Fs (2, 3) 
hes the betemWiittle distribution with paremeter (tiptight)» then 


(e j= Fs ELGG, gle Ao jh, eeng N (6.528) 


CF 53° 8, 5.) p CK, 
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arthe Bm larly» 
ELf.9 4 3 c 
gms “ahaa : * Af wie 
at% 1+ n 5 Ly 2 eat E Bee Bag] 


Py8.cn Poa Pap 


Gap ypSGly woog N (605.5e) 
BPS, 


Soy BS 


a 8 ogg tag te Sof vena se20 N (5.5.9) 


In both of theee equations £,[+] denotes the expe: 
to the distribution al M)e These expectations osm be evaluated by 
repented application of Lama 2.3.2 in a manner Witich should, ty now, be 
feniliar, but the cclowlations, particularly in (6.5.92), tari to beesz9 
extensive. Approximations of the sort we have Gismesed an Chapter & oan 
oise be made, For emali velues of the parmeter n, dircot calomintion of 
the mements is preobebly the most convemient way t eppreach the prebicn. 


tation operator relative 






















6.5.2 tha Deteie thlees Dlatel at The set of a2 porsthle 
trancition onmts waieh oan arise from 6 sample of n consecutive 
transitions in a Mastzeevy chain with arbitrary initial state and a positive 
transition probebdlitgy matrix is 


$ B | 
74 (n) = U, Pyglasnd- 7h p2ecee (6.5040) 


Aw G 
The i x N rondo matela F with range set P ,{) is said to have the etaniard 
beterWhittle-2 distribation with paranster (pemgl) 1£ ¥ has the probatd lity 
mass function 





oe F| Potlolt oH) & Ee | pote?) tel Dae. (6.5642) 
é : 


warp ps (Dye ve p) is a atochastic veetor which is fimotionaliy 
irceperient of P, TER gle Bpoacg Get } Hm Ce, 4 49 an 8 2 metrics with 


O44 > 0 (4 Fd gecey Bi). It is readily established that ? ‘ ) o (F | pore} > 6 

ond that 

> gt (F | potsi8) & Le 
Ee - (n) Bie sia ‘s 

Lot 

* sgn) zs tel Be" An)s £, = £4 Gets ooo ut (6.5442) 
and 

P afr) = G* tn) = PP Cn. (6.5.83) 
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= 0. eLehers (6.5.24) 
In (6.5.1), 4£ Fe P'_(n)y (ayv) de the unique solution to the equctions 


£.. ad Ss sed 80° Say" Als cosy f 


fn inpertant case in which p is not fune 
eours then po T(B), th 
to P. In this instence wo define the nanstaxlard betawhittles2 distriiution 

with parameter (n,lf) in tesms of the Sollowkng probatiity mass fmeticn, 
il 






aioe CELI) © PL” (P| ame) £ Oe war, (6.5.45) 


Where 1931,2,3,000 at is hen £9 an 8 2 N mated cach Unt or ¥ 0 
(i554, coos Boe The vector 7 in the dategramd of (6.5.45) 1a the 
steady state vector corresportling to P ani ta uniquely defined for all P 
exoept o eet cf msagare sero. It is clear thet the range cet of F as 


p° kn) ed thet SUR | mM) 4o a proper probeliility mass Punatéion. 
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Phere On5ee If 
7% (Ws P| 7 (2) eee | yiape Why coop NB (605086) 
“ 
is the emectes valine of TPs then the nonstandard beta=Whittle=2 
probakt lity masa fuenetion with parameter (ngit) 43 given ty 


Mf, oe, om) 
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e 0, elsexhera (6.5087) 


in (6.5.17); 1f Fe $ olds (u,v) is tho wique eolution to tho equations 
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“sd rele? [ree (E | SemB)e,, (P| Har. (6.5048) 


Tho keenel of the integrand of (6.5.18) 43 
re] £. ¢B 
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Tims, proceeding es in the proof ef Theorem 6.5el, 
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Equation (6.5017) follows Pron itis end Lemma 6628056 QeWebe 


rae ¢ 


N 
(E | asi) = i ‘nr Per | sen LO P\Pe wdp. (6.5.89) 


The menents of the stantard bete- Whittle? distrituticn can be ebtcined 
fvorm the moments of the beta-Whittle distrilation by ding the relation 
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fo10 & | Pete) = E | Pag GE | Somep- (6.50%) 


The momenta of the nonstandan!t beteWwhi 
following thooraz. 





Stlew2 distritution are given by the 


Theorem 6.5.3 Let Ele] denote the expectation operator ralative 
to the nonstandard bete-Whittle-2? dietritution with parenster (24H) er 
Bt ¢} demte the expectation operater relative to the matri beta 
Hstritation with parencter He Thon 
tf, 4; a np ca gPaglo So He coop KB (605024) 


and 
BLE pty) 5 6 ss aye fag] & es ig Pie LE chy Tr (WB ep 30° }3 


SgBsV_ Sy sean (60$.220) 
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Proof. The theoren follows dmnodtiately frew eqieticns (6.2.53) and 
(622052) » together with the relation 


Efa(f)] © G8) Leth]. 


where atF) is any function of F for whieh the expectation exlats el E.. Die 


Ae the expectation operator relative to the Whittle-2 distsiiutien with 
paraneter (TeMeP). QobeDe 





CHAPTER 7 
FREEO SAMPLS SLES ANALYSES 


in Caspters J-5 we ommined some sequential sampling problems fn a 
Marisev chain ulth citernatives, We now consider the priorepesterier and 
presssterior analysis of a Harkey chain governed by a fined, oat uricmou, 
Hx N matrix of transition probabllities, B froa which a fixed mamber of 
eansscative ebservations 42 draun. In Section 7.i this anelyais is 
carzied out umler the assumption that the iritial estate ie known fe the 
decieion=emaker before the eaunie is ebserved. In Sestien 7.2 4% is ascumed 





that the initial gtete is unimmm ari has a Gistribution which ia timetionaliy 


imdessmiont of By in the final seation 4% ie assumed that fhe chein is 
operating in the steady state ami that the initisl state is unimom. 





fn Nesgtate Herkov chain ean ba considered te be a process which 





enerates the sequence of random variables, = aU Ro eevg %, esag NETS 
ets 4 nf 4s the state of the syrsten pln ee efter the 1th 

ie (42% ,2,006) amd Hy 48 the initial state chsorved before the 

first transition. Thie initial state, %, = % 4a cabjest to the 

Letra bution ps (Pgs aces PJ Where piss stochastic veoter emi 

22=Pin <4) (402, cocg B)e Tho trancitéens of the chedin are governed iy 


3 


tho 8 = N otochastio netrix Ps Cp, Js shore p= me TIPE, 283 
(eh, coon Ne ae eee I@ 4s aacumed 3n this section that. the 


imal state is lkeuewm te the desielommaker. Time, in tis case, 


p, & me « 42%, eeon (Feded) 





lot x S (a een x) be a semple 
of m consecutive transitions in a Harkov chain, where =u ds agaummed 
krewn to the deaiaionemker before the sammie is obtained. ‘Tims, z. 40 
ebtained under the censemtive sampling rule. let F = (f, 3) be “he 
transi tion count of x Then the conditécnal salen given Pe P, 
ef obeerving the sample x, is 
RON 
Paget ate’ Pamaty Da fA 

If the stopping process is reninfomative, then (7.1.2) ia the kernel of 
the Tikeliheod of the sample. It is clear that the statistic F conveys 
all the information of the sammie and that, 4f stepping 1s norinformtive, 
F 4s a sufficient statistic. 

When the transition probability mateix is regarded as a ranlom rateLz 
P, the natural conjugate of (7.1.2) 49 the matsiz beta distritution defined 
ty equation (6.3.4) with K, 2 (Astd,y coos Bo 


° (702-2) 


H 6 = 
ely It St yy om (70423) 
gat go 43 ; 


If B has the matrix beta distritution with parameter Mt = Coy 43 ana af 
a sample from the prooess yields o cut@icient statistic Fo Theowen Selei 
shows that the metorlor detribition ef P ig matels beta wth paramotar 


Mea ht + BF. C7 of 04h) 
4 & 4 


It is acamed 





cet 2, @ u is knowm and thet np, the mmber of transitions to be ebserval, 
26 Aetersnine’ be ed. Prier te sampling, the 





OFC eho sample 4a a5} 8 CAS F 





wo 2090 
trans tien count F io @ rorden matrix ari the contltioneal probability, 
given Ps P» that the Marker chain will generates o specific sample z, 
whieh has the transition count P 4s given by (7.1.2). wWhitele [41] has show 
that tho munber of sammles of aise n wits x, & u ubich have the tranaltion 
eon’ E is given by 





a (70405) 


N 
Where f s £44 Gras ene PO eee a ee ee ae | 
Fs 48 the (v,ujth osfactor of the mtrix r Gefined ty equation (6.4.5). 
Time, the comfitional orohability of Fis given ty the Whittle probetdlity; 


raga fanction defined by (60207)» 
PEE \usme) = 2 ¢P jayne). (7446) 


If a aanpls ef n esnsecautive transitions is obtained fram a Markey 
ohein with lexem initial state u end 4f the trensition matrix P hes the 
matetinx beta Gletritatien with parenster Hs then the unconditional 
Aoteibation of the transition count F 1s 


DCF | uamti*) = fe PO Ce JusmeP a (P| MAE (74467) 


8 
It 4a seam from equation (6.5.3) that the uncomiitional Gstribatica of 
F as the betesiniétile mass funetion given by (605.5), 





(3 
DCE | wget) = 227 CF | aatagk?) (704.8) 
If the prior disteibstion of F 49 matrix beta with parameter a 


Af a sample of aise n yiclds a oufficlent statletic Fy ‘then emations (Pellet) 





oEGae~ 
eixd (6.3.7) show that the mean of the posterior Astrltutien is 
po ta fp. 5 (7 08093 
= a3 


“- ma? + £ 

on a Sig Jp conn MN (7oiedD) 
rm oh 
de a 


Before observing ths sample, Be 4s & random mteisx whieh can take ons af 
&@ finite sot of values in tho range Bet R(usmpll). Lat, 
(© fel re p.tunds Bee Pt — pening) — (761-88) 


be the set of nosaiblo tremition counts whieh result in a posterior moan 
with tho value PeR(ugnsf*). ‘Then, by (7.2.5), the dstritation of the 
posterior neon 4s piven by the following prebatility mass fiction, 


i 
PL E> = P | apraphf* J & op or Ae | tongk*). Bek uetigit®) 
mo (Patel 





(om ROtiaL state Unloom. 

He now assume that the initial ctate, % = %, 4s uriexan te the 
decdidiommnaker before the sample ia ebserved, but that T has a probability 
Glatritution, p= (pio +.» D,.)e which fe functionally independent of F 
atsl vhieh may ox may not be laoun te the deaialon-maker. if P ds a sie a 
%% 4s ales assumed that tho uthiat of any teminal decision mada aftar 


2, is observed depunis only on P and not on 














Lot. 2 ZS Cie coop B) be @ 
aomnle of m consecutive tangitions in a Markev chain. Let am a, be the 
initiel state observed and let F = [£, 4] be the transition emimt of the 
sample. Then the eonditiensl probability, given P= P avi p= p, of 
observing the samle z is 
n 

Pe Pathe ge” Py Jt Bag, (7.268) 
If the stopping process in wordnfemative, then, gincs terminal utilities 
depand only en P and rat on 3% the kamel of the Mkelithoed of the campo 
4.8 


H 8 ¢ 
yt Jt p,, 22 (7,2.2) 
gmt fag 13 


amd E 43 a marginally saffiekent etatistio. 

When the mateis of transition probabilities ia treated as a 2armion 
natrix By the naturel conjugate of (7.2.2) ie the mtrix beta dietyabution 
defined ty (6.3.4) with K m § (4k, ccey Ne If P has the mitsix beta 
Gstribation with pareneter N° = tn, J axl 42 a eemple from the process 
yaelds a narpinally sufficient statistic Ps then the posterior disteitution 
of Fis natalz beta with paranster 

iv on M0 + Fe (7.203) 











that n, the muher of transitions to be ebearval, ie determined 
before the sample 4s obtained. Prter to sampling, the pair (Si) 39 © 
rendon quantity end the conditional probability, given Pe P and f= p, 
that the Harsov chain wilh geterate a specific sample g with the 





a Tijlbe 
statistic (pf) is given ty (7.2.4). The muber of saupllee of slse n 
With initial state u ubich have the transition count F isa given ty CFakaS5de 
Therefore, the conditional probability of (BF) is given by the Whittle’ 
probakility mss function defined by equation (6.1.37), 


Pusk | pemeP) = ry Cask | DomeG)- (FeZeid) 


the conditional distritution of the marginally sufficient statistic Fis 
tho Whittloees probabil ty mass funotion given by equation (6.1.69), 


ME | pemsgl = $)¢z | pemb)- (7-25) 





Tf a comple ef m eomcoantive transitions is obtalmed from 4 Marker 
chain where the distritation of the initial state Le know to be p and 
where the trangition probability matrix P hos the mateix bets dictritution 
with parameter Hs then, provided > §6 functLonally Aandepenient of Pe the 
uncanditianal cistritution of the transition count E 4g 


xe teeny = | Peel rseagd Mex wrap. (7.206 
br ‘ 


Tua, by equation (6.5.14), the mootittenal diatsibeticn of F ae the 
betasthittie=2 probaiiiity meas finetion si.von by (6.5.28), 





(8) atghe 
OCE | Pomel) © foro | Botall") Soi 
f2 B is urkrew and has the prior Aisteibution fmotton Mp|1), whth 
men D4), then equations (7.2.7) and (6.5.9) show that tho 
tieamittiens] disteibation of F 4s alas bete-Uhittles2, 





o(F | Yonsit*) f en (F | Porsfl VaH(p It) 
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nce ca (F( oC 1) omehi?). (7.2.6) 
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re P haa tho matrix beta dlotritation with prlor parascter H° end 4¢ 
@ semplo yields the marzinaliy enffictent statistic F, the ef the 
posterdor Astritution of F As given by equations (7.1.9) and (7.1.40). 
Spots evcbonerize: Stn Byala Tint peiiniey wees, 22. be, aaa arte 
with tho finite range set F * (m4H?). Lat 

stp) = {e| zed °c, Free} gest) (7.2.9 

be the set of possible transition alate Te tal, dl ae See 
PY a PeR (ngli*). ‘Thorty from (7+2-7)y wo find thats 4f p de knowm, the 
cisteitution of the rosterler mean ia given by the following probabilit: 
mass Luncticns, 





ete (8) | * 
=z] pega 2.5 2 GE] oomy zara 
oS & 


a Oo elsesbers (722040) 
WwAlariy, if 5 is uimowm ari has the prior dietyibatian fection A(pi1), 
the Gotwiintion of the rostemior mean is 


Sa a2 Cis >, gi? 7 i 
ee eS [eel ee, (2 fae E | BUY domey?) PeR (msi!) 
oO, olsethare (7 208k) 
703 ne ain the Stesdy States, 





When the Harisev 
state, 0, 49 umown, the distritution of { fe IP) = (74 (Be oes 77,(Bi)< 
whe steady-state prohabliity vector aseéalated wlth the transition ratriz 
BP, In this ease, cbsermtion of % provides infornstion atont P 





chain fg operating an the steady state and the Iratdal 








o2}be 
steady state. 3fuas, is the initial state end Fe C8 54 ia the transition 
qount of the cumple, the conditional yoobability, given that P= Py of 
eoa¢euaryy the sample x, is 
HON 
Mag?) Pay * Panay” aS? oo my (76304) 
When stopping is reninferaative, oqantion(7.3.4) is the kernel of tho 
Wkelihoad of the sample ami the orderad pair(a,F) 46 a oofficiert etatictic. 
then P, the matrix of transition probebilities, 4s regarded as a 
ranwa matrix the neturel conjugate of equation (7.3.2) 4a the natrdz beta’ 
disteitition define! ty (6.%,12), rou (P | Mev) « Et 4a easliy sem that, 
4f F has the matrix betes? distribation with parameter (H°, vt) and 4c a 
emole from the yields a efficient statistic (uf), the posterior 
Asteibution of § 4 matrix beta-t with parameter (M°yv"), where, 2? 
g, 28 on Redimenatomal sow veoter wlth wth conponer 
other componaite eqnal % sore, 
ere = (703022) 
Ss oo 
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As wis noted dn Section 6.4, the normalising constemt axl the momants 
of the natrix betes) distribution are GiMiantht to comute. Thies difmcalt: 
ccuplicates the task of asslening a specific mtris beter? neler 
Aistaiation to B, sine the sateiz bete Metritation 4s also a matelr 
betae< distribution with the 








eyes bee s 3 (0, oveg 0) 8 Ge 


Nel (1) s 
~ P| 4) eS ey (P| M0), C7 2309) 


2% may be aanverfleyt fer tha dedialenmaker to use a matris beta poder 
disteiution for P emi ve shall assume thie to be the osss in Giecioning 





oe fin 
16 prenostemior enalyais ef a Markov chain oporwating in the ateady state. 





satis nh iat tha pagan bieenaied er te eae Prior to 
camping, the conditions) probability, given P= P, of obtaining 2 
apecific sample 3 with the statistic (uF) 4s given by (7.3.4). ‘the 
member of gsamles of sise n vith initial stcte « which have the trancitisn 


comnt F £6 given by (7.1.5) ani, therefore, the comiitéenal peobanility 
of the atatisttc (i,f) 46 given ty the Wlttle-$ probability mass function 
as defined by (6.1.37) 
Potek | np) = 2 cask | ICP) sme)» (763eb) 
The marginal conditions] Getribution of & te IP) axl the marginal 
osxitionl dstritautton of F Ag the Uhatele»2 diststbutica, 
(>| ap) are) 

Hnen a gamle ZB, 4s obteined fron a Markey chain operating 4n the 
ateady state there the initial state Se wie ami the tremmeition 
probehiMty nstriz F hes the matrix beta Gistritation with parameter i°, 
tho mooniitionn) disteitution of the transition cunt F 3s 











BE | may) = z fs Oe] EPI) ne | Heap. (7-965) 





Theralore,s Te? ocnaetion (6.5025) » tho uneeriitéicnel aetrit: 
DCE \mate) = £00 CE | mept) (7.3.6) 


tion of © 





Lt 48 then costly see that, af the s*(P) §e desaned by acmation 
(72209) the orbor distsliation of the msterior mean is given ty tho 
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CHAPTER 3 
SPECIFIC RESULTS FOR A 
TWOeSTATE MARKOV CHAIN 


Many of the matters coneidered in previous chapters are specisliged 
to the case of a mnestate Markov chain in thie chapter. The 2 zx 2 
transition probatility mtrix P is assumed to have the matrix beta 
Gistribution ani explicit formmias are found for the means and product 
mments of the n-step transition probebilitics, the steady-state 
probaktlities, the precese gain, and the oxected tote] discounted rewards, 
The chapter concindes with a reault concerning the selection of an 
optimal terminal poliay for a two-state process with a special type of 
vesard structure. Most ef the formilas derlvad hers are double infinite 
geriess it appears doubtfil thet similar expresaions can be ebteined 
for cheins with more then tw states. 





Pe Lex x O<xnysi (8.4.38) 
y dey 
be the tranaition probatdiity matrix for a twoestate Harkev chain. The 
eigenvalues of P ave the mots of the equation 
[a = P| ah” « (ZemyjA + (lemey) = 0 (8.4.23 
axi ars found to be 
A, =i O<my<t (8.1.32) 


4 
he, m3 Llomvy, O<mxsSi (8.4.3) 


a2 ie 
fae eigenvalues of P are distinet provided x and y are not beth equal to 
B86rG~ When hg | hey Syivester*s Thesren leads to the spectral decomposition 








> 4 -% ; 
ie a “aby sey + ( Lossy) “oy my (8.4 od) 
y = “7 y xP Oory #0 
my Ry WY By 


Equation (8.1.4) dmmodtately gives the following expressions for the steady= 
state vootor, 











TCP) ~y Lee t = “ | ) (8.4.5) 
my RY 
m0 or yf) 
ena the n=step probability matrix, 
Ke |i ) 217 8 Lott | ee © 
. Pay Pro , Ad ay + megs “sy Bry 
op) op) y x “y y 
=, = gy ay ay ey 
i hr pee 1» (8.2.4) 
kn particular 
Ww) ne Lo fA ti (tomy) 
Pan Say (L- (lemy) 1 2 ape Ciemey) 
‘Se Comey) P21 4243 (8 : 7) 
3 aKeoy aa ane obe 
tera} f uf oF #0 
axl, similarly, 


(4) Ci Nd ine (temy)”. al 929350. (8.23.83 
teen) 9) or yD 


Let the process have the rand matriz 


ks UF = 18 b (8.4.9) 
e d 


Were a4 4s the reward earned when the procecs makes a transition free 


state 1 to atate 3 fo Oc Ey 


Zo ), 





(6.3.90) 
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Thus, X and ¥ are independent randon variables, each having tho univariate 
bete distritution. It 4s to be noted that equations(8.1.7) and (8.1.8) are 
valid for all ged, except a set of measure sere relative te the matrix 
beta distrLbution. 





Xt will be convenient to use the hypergeometria cocfficlant, (2) ,.0 
in the equations of eubsequent sections. This esefficient is defined hore 
and seme of ite prepertice are derived. 

Let x be any real mmmber and k eny nonnegative integer. The inyper= 
geonetric coefficient 49 dofined by 


Cz), ad x(s-4) eve (erkeol), KGL e2eecee (8.2048) 
es 4h. Yad (8. 2.4b) 
ff 2> 0 4t is clear that 
[* Cseete) 
T (32) 
and in the case x = i, 
(4), if (8.2.33 





Lemme 8.2.4 If (32), is the hypergeometric coofficient defined ty 
(8.2.4), thes the felloving relations held, 
e(xer2) x (x), (ante) (8.202) 
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atatd) a (x) (8.245) 
(2), Ro (sx), (ark) (8.226) 
atari) wr (x), (scele) (rote d) (8.2.7) 
(22), (ak) = Gi (8.2.8) 


Exnofe Equetions (8.2.4) end (8.2.5) follow by wrating 
a(aeh) m mmel) .o(ssc) o (x), (ave) & (2) (8.2.9) 


Equet4on (8.2.6) fellews Grestly from (8.2.44) and (8.2.5). To ebdtzin 
(8.2.7) we use (8.2.6) and (8.2.4) to obtain 
eel 2 x(xrt), ( settord ) 


sd (xx), (aati) Casvice (8.2.10) 
Eeuation (8.2.8) foliews by direet expancion, 
(x), (atk) #2 xe(see8) oo (spiced (ace) . oo (eke vel) 


‘set 3 ® 8.2.5 2% 
Belialle 





theoren to expand the facter (4emey)* of eine (8.167), 


k i 
Comey) se £05 8)” y” (ey), (3.304) 
yet) ekg cyeece 
Q<2ys2 


We oan urite 





e2AZ- 
Mat 
sap te fa Te comet Pg Ia 


fod 
aE E(u Met) BOP IE (KIT. (8.9.2) 
ex) urd ¥y at 


For @ = Opigtgece 


1 
_ 2 
B [t-%)°] © se | zit nh sny montage 











B(sagn) 
BC rete, m2) % (rm) 
topwrer = = “tuensi) (8.3.3) 
ard a 
i | (piajed,, ant 
— pate 
BLY] i R(p,q) ma (2=y) ay 
, Bip _ (Pa, (8. 3at) 
Bt pq) (prg), 
Tes we have 
ol, (m}, (p) 
tis ec oc CE yet)? ev (8.5053 
ee} Cornett), (pea) 
Jaig2s3eeee 


ihe following recurrence relation, wiich follews imaedlately from (8.3.5). 
is of use for coupating successive valnes of ste, Js 





, (a) Cr) 
ee 1+ 2. £ (Ano ye 
we (aened), (pt) 


fr B1,25 39000 (6.2.6) 
In a similar fashien an axpresaion fer ep) 3 is easily derived, 


ueing (8.4.8). 
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lg 
ss = (¢& yuay” Ieou wed (8.307) 


inf amg i‘ fae), (Pe 





[ABA p2zece 
Por purposes ef commutation, wo have the re 68 Polation 
(a), (>) 
ey a tele Ee” ee vt 8.3.8) 
ed 20 (pq 


aes? ov wb’ 
Pthe2scee 
The derivation ef (8.3.5) and (8.3.7) dopexied upon the fora of tis 
expressions (8.1.7) and (8.1.8). Sintler expressions carrot be obtained 
for pt ard pi). ‘Tay the diagonal llements of the mean n-step 
ere probaiiiity matris mist be computed fica the re™ations 


apy?) ss he } (8.3.90) 
ami 
cee) 2 r-of8/ s. (3.3.91) 


He now verify that ete } gathefies the recursive eqiation (4.2.236 
That 49, wo shall show that Bao 2 etn” 3 enti afies 


2 
ah (etn e =, g) van 2) Bis (8.3.30) 


Where Py i? is tho expected value of Pe when F hae the distsi bution 
bP) p | and whore T a4 40 the parancter satriz M whith Sts (2,3)& 
Sleaemt imoressed by unity. 


Sasao 
Do) = ate (8.3.4%a) 
Pos 8D ~— (8.3548 


the sleht side of (8.3.20) is 
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Aol  ¥ 
BR ret J »v (np (pp) 
Crh ened Kg - 


aos Pra) 


=i ke (ra), (p) 
as im “E 2 ( 3 y=)” ee ea : (3.9242) 
PQ Bm ta} vad (month), feraed) 


Veing (8.2.4), equation (8.3.41) aan be mitten 








fot & v (p) a 
fore Fos (8) (et) (Dicey Ye ape ee ae 
Bare ke wd (nened), (pea), prgty = arity 

(8.3.23) 
Since 
q & eg & Eigey ed hed (Bo 3ehte). 
prqiv mtreZdlcey mintitkey piqev 
(8.3.43) becones, Upon apnlyim (8.2.6), 
eo, & | (rm), (Cp) 
—— | +’ (604 = wri 
ion) yet) crprrhd ny Pry) ag 
fot k {m) (p) 
#5 of (ety ee (8.3045) 
ye ae mtd) ey PQ) 
oma ond neting that (") + Fo cf) 
Letting § @ vid in the first om noting ae x) wie 
wo obtain 
ant (Ding ase 
7 le 2 s Jeo 
(PDices 


wich, upon letting jo it! ami collecting terms, is ae +4) (1). &S 


required. 4 gimller darivatien shows that BOM, es defined ty (8.3.7). 





eati s£3.os (43.4.2) » 


sgh 
8.4 Expected Valne of “op Bys. 


Using (6.1.7), wo have, for fixed P, 
& 


pod poh 
pi) ax: ‘¢ (vag (Belted) 
Ca) 
ar, therefore, 
koi Sik eee 
at(By > yy st (Ft) (02)" By] s x (reg), (84.2) 
Since 
Borie, me2) (n) C 
E [¥° (16%) } SS cence 8 hl =), (8.0.3) 
% B(man) (men), (ment2) 
CPD g4s2p 00. 
we have 
) (mj, Mel Aok 
scetyy = Me AEE tye? Mgmt My 
(iain). $0 bed wd ~=«(CV (mente) (prq 
dle 
Meise, ove (8.4.8) 
Similarivy, ty equation (8.1.8). 
of} 
(MPa PEE omg), (8005) 
pan igo) 
and 
>  froh pal sok (2) acinar Pd 
oy le ese Ct) ee we (8.15.6) 
a (an)... (pra) 


wee 
Ply 2s cee 


Siiteu 


Finally, 





(HY) (2) pal pol Sate 
Peo Pog 


‘2 Bn, = ww fs & (famayjJ" (8.43.73 
SA kao 


we hsve 


yf p) eS pal pod 
4p Pet edhe &D td 


heap: é aes +* € 





Frgegeve (8.8.8) 
Tae sana methed can be used te dorive more general product 
fAAv) 





8.5 StosheState Pobshili ties. 
Wa now obtain expresaions for the means and prodzct macnts of 
EAP) a ( Tyo de iver [23], treating the special ease where y 4s 
known aml ¥ has the beta distrilation, has show that Ef 77.) 4s a Geusaian 
By Theoren 4.2.5, JR, ECpY)] © BC 7) and using equations (8-3.7) 
ara (3.3.5), we immediately have 
oo «=O tm), Cp) 





l= 5 5 yt)’ ey vet (8.5.4) 
ate of (ain eon) Bre vri 
i: Am) Cp) 
EW) - =— z 2 (8) (249° Sah “ E (8.502) 
ied vcd Const) (rad 


fheoren 4.2.5 impidea that the series (8.5.4) and (8.5.2) both converes. 
We shell show thet they converge conditionally. Neglecting the constant 
maltapiier 1. » ores noting that (") cs { “ 


' de the series ef abaolate 
men —_ 
vTaimes corrosporxting to (8.5.2) is 





= 
Brdélyi [47], oh. 2. 





ole 


o x iy aay FB ty HM, 
keg Y am em ra weg kad (aime), (era) | 
ee (2) wane fP),, oe 
0 lee) erm “ye “vt 
c2 B(Asinypomint2, Pras 1.2); ; (3.523) 


where F(esPoB evay’s %¥) As Appell’s sesoml hypergecmetric funstien of 
tao arguments [2]. Since F,(os8.B*syey"s ey) diverges shenever |x| + |y| > i, 
the series (8.5.3) diverges and, therefore, the sertes (5.5.2) eon 
comiitiomliy. A cintlar proof establishes the ooxiitiensl convergence of 
(8.501). 

It 49 cashly verified that Ef 7] and Ef 77] satiety equation (4.2.4i0a). 
Let T4{u) © Ef iT] (Jale2). Thon 4 mst be shown that BA antiefies 











tal = 


ke kgs 
We shall consider the case je2; the proof for si is sinklar. 


For j= 2 the risht aide of (8.50%) 43 


2 
FGM =F FCA ON) ReyD> te? (8.508) 


rn se “)fe8)” ho tay? P) at ae ae Wey  ¥ r 
130 = ( Pra) mn piq (nati) (peqti) 
o® 
| oi e (8555) 
2 Low Se ot a , 
(PD) g 7 tP),, (Dp) prety) (p) 


—_ (8.5.6) 


Ga), (ra) (Peary) (pra), 


em, therefore. that (8.5.5) ia aqual Tk). 


a Zhe 

By Theoren 4.2.8, seed tea o BF, 7] axl we obtain the 
following equatiens from (8.4.6)» (8.l3.8), cmd (8.5). 

(ra) 

(mn 


greov'P) val (8-5-7) 


gateoy Pt" ye 


HA Je ss zy 4) 


, £8058} 





co of jee tk eo (ma) , (p) | 
& ae of rey wed & (B.5e3) 
Lm, 7) ph stan ee (7, dhe) aa eee 


The caries (8.5.7) = (8.5.9) are conditionally convergent. We 
43ingtrate the proof for eqnatfien (8.5.8). By Theoren 4.2.8, the double 
infinite sorles (8.5.8) amwerges. Neglecting the constant mltipier 


(8.589) 





Uaing (8.5.3) we can uriteo arustion (8.52403 aS 


Babette pelrerr2, pras 1,2) + 2, 2 = ety } a ee (8.50284) 
Stites v 
thioh diverges. Thus (8.5.8) 4s comitionaliy cenvercont 
guy? that (8.5.7) and (8.5.9) also comvarge coxiitionally. 
Et con be verified that A 7, r,) satisfies equation (4.2.55). the 
Mipubwa Le otreightfommed but tedious uri will not be reproduced have. 











2 2200 
6.6 Sronesa Gale 

The agpected gain of the tmestate Markov choin conaiderad in this 
chapter is, ty (4.4.9), 








@t 92 
awe cs ff 7, 4) B A e, : €8a604) 
a 454 ge 1 
if the reward matwix R ie given by (8.2.9), the 
_ co «= liéiK (m2), (p) 
H)e F S ie : 

ett) 7 ECM fa ion (some), Oe 

n (x) tray DPS) 
tt tb ame , ¥ 

Taras) Hr, 


LB, fn (oy (PI 
+ Gd mm pig arent), ma, (8.6.2) 


aymlging (8.24) » (8.2.5) and (8.2.6), wo have 








= oo K (rs) (ps) fee 
gp) 2s 5 ()cer” kev vet 0 8 (antev)ettor Bod, 
BmI=0 v0 mnth), (PHU ae 
(8.623) 
Lcecorted Rema Yentos. 
Tho exmected Geosunted reward over an infinite perlod when the aysten 


starts in etate 4 is piven by emation (4.3.7) as 


o” = 5 BYP CD) Bygltld Payee Mady® (8.748) 
3 ee | ee 


ye Eo” RGD. Angetate (847.29 





a nr9 Oo 
For (1,9) = (1,2), we have, by equation (5.3.5), 











oh, 
S00 © Sot Ba of Cen” wines Py 
Aoi = prin bed (mrntd), (pra, 
(ened), (pia) 
p20 isd mend), (Bia), 
(8.7093) 
(m),_.,¢p} 
Sp 
ee. = C04)” pate ee i E Catan) <i, (8.7 4) 
min |\aQ Y (minti), (ptq),, 
= (ma), (p) 
oo a B % 
Bae & FS aAlE Ktet)” ev 
min £20 kd” jusd meerdd, wy PhQ),, 
<p= £ pW“ sp f (pee. (8.765) 
pe eed pe 


The ratio test shows that E (met) BR” converged, hones, we may Intes= 
la | 
charge the firct Gm mummtion operators in (8.7.3) t obtain 


oo Ig (x), Up) 
Bn * k, oi = | 
5 (15) - E : a ( ) 6 (a) bow v r {6.57 46) 
te (ep)(am) ico = ” (nened)), Aco 


Reglecting the constant miltiplier, the series of absolute 
oe ODP ROE ZLALYE 2 (8.7.6) ia, RCS 2HVeriaeia ie tho oder of siz 






o oo (ky) P(p), (p) 
= dg *'f ey i 8” 
v0 10 tus (ormet), (rq) 
2 Po(LerePs mri, piqg By B)s (8.7 oF) 


remptele Luncstden of 





whore Fo(a8e0%syev's B27) 40 Appell’s sevent hyper 





a2O%e 
tw variables [21]. Appell has shown that the series (3.7.7) eonverges if 
O<B<s amt diverges if ne B<i. The caso 8 = = has net yet been 
investigated. Wea conclude that (8.7.6) converges absolutely for 0< g<% 
and, since Theorem 4.3.? implies the convergence of (3.7.6), that the 
series converges conditionally for $<p< i, 
For (4,3) = (2.1) we use equatiun (8.3.7) to obtain 
S94 (1) a US E zB (&) peony” cae eae ee ; (8.7.8) 
kev PPD) at 
the series converging absolutely for 0< s< 4 and conditionally fer $ <p < i. 


The rensiring cases aro 


Ae o{ A) ie a , 
Syae) = E BOUL = Be I= Bp 8,0 (8.7.9) 
arc 
Foe 

The expected Giscounted reward starting from state i is 

.. 2 2 2 

Vi(w)= 5 BCH + x ££ S,,(T,. (M)) Bb. (CH “ 8.7.42 
Ag) ue Pf) Fy, Se eck 1967 gE) Pa (HD ‘- =O 


Using the reward matrix (8.1.9) and the fommilas (8.7.6) end (8.7.9), 
we obtain, upon collecting ters, 
ma + nb nf oo i 








7G) = + —= if (= Cen (aa)F 
i's (i~B) (men) = (4oB) (men) = kee 
(nm) j(P),, a(pey) + dq A alirrkeu) + blre4) to 
Carmed) Pred, piqty Brine bakes 7 


in &@ similar manner wo find 
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~ op + dq co kK , {m),  (p) 
At ee Se ws, Om 
(298)(prq)  AeB ke ut (en), (pea) 








= stn, eC probtdc+ de 1. 
x - ; (8.7.29) 


srtirtiey prarwel 
It can be shown that V4 (4) and V(t) satisfy equation (4.3.9). 
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Lon of & Rosult of Shor. 

N. Z. Shor [37] has considered a game-theoretic adel of a trp-sitate 
Markov chain with alternatives and rewards. He shows that umler certain 
ciroumstanees each player should act so as to maximize his expected onceste 
transition reward. This result ia generalized here. 

Consider a twestate Markov chain with K 4 alternatives in state 
4(421,2). Assume that the rewards depend only on the initial and final 
states i and j and not on tho alternative used in making a transition 


from 2 to j. Assume further that the roward matrix is 


R K B.8.2; 
Ks rh, C } 
wnere, if r is any real number, 
rit a > CM. bike (8.8.28) 
4 a re? A Kei, cooy K (8.8.26) 
42 q B 9 | 
K Pe > k= i (8.8.2c} 
04 Y 4 w @ e8e09 ‘9 eol ts) 
ey PE + Aye Kal, coos K, (8.8.24) 


We require that (020 and A, 2A, = fale 
Loe ¢ fb, 3) be the matrix of alternative trangition probabili tics 


ww Piles 
4s the narginal distritation fanction of Mr), 4% is assumed that, far 


all Tek, Fo (P lv) 25 contimseus on the boundary of Jt, at 
The expected gain of the system unmisr the policy JT is aS. +). 
Suppoge 441s desired te cheese a noltiay, ont, which maximises the 
expected gain, 
WM TY) s re ee +} ° (8.8.3) 


we shall show that, with the rewanl! structure (5.5.2), 4% 1s sufficient te 
solve the corresponding deterministic problem for eg (tt) ete \t 

and that the optimal policy, ga = (7s 7 )s 4s dotermined by the 
aquations 


o~* . 
a MAX ok ; 
Pot t ) > k=l, 3 oKy ; Py of Y ){ e {=1,2. (8.3.4) 


Let a. 42) be the conditional total expected reward in n 
~ a under the policy CT when the systen starts from state 4 (4121,2) 
ana P(r) =P. Let cad Coa 1) bo the corresponding uncom tional 
exnsoted reward, 


r,t) Jo () oper, (BVT). (8.8 5) 
bey 421 ,2 
WL ely eae 
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Lemme 8.8.4 For TE] 5Zos00 and ali eg 
aint) - Bah <a,. (8.8.6) 


w 
Proof, We chow that, for all Peo ; 


BP Co PD = CEES Ags ty Zyene (8.847) 
£e 
a 
from which equation (8.8.6) fellows, sines & ° &o is a set of 


measure zero relative te F, (2\7). 





oP 25e 


Lot f {u,n) bo the expected maaber of transitions from stato 4 to 


3 
state j inn transitions when the systen starts from state ue. Then, for 


ali ped, Q 


af?) Ts P) o mp + (7, (don) + £,o(20n) ] } A, F, (aan) A £,0( 42%). 


424.2 (8.8.8) 
V931 4 Zy eee 
O- ef 
1% 4 P is represented as in (8.4.1) and the eigenvalues of P are haat and 
S | 


hSlowyvs we can use the spsetral representation of P given ty (5.2.44) 
together with the expresaion for t, goon) given by equation (6.1.15) to 
ebtain 


Ee Yi 
L- Ah, é 
Since P20 and A,2A, = % 


1-20 
a (x4) - of) 2,2) < ate Catyet) A, 





v4 
io a,” 
= oh ; g A © (6.8.20) 
Z~-a, + 
2 
if ZA, <a 1, then 
2 = ho” 
whe ; Seat, A. <O< Ais (8.838.443 
"2 


whereas, if #2 <h, <0, 


4 i," 2 rt 
€> wy Ie ie.) eae 
a a KypA, (LFA FAQ + + i, ) 
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cy & 3eSeke; 
s hod, <A, (6.8.4 


in Gither case, wo obtain (8.8.7). Qelets 
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Lemma 8.8.2 For 1>1,2 end o ef, 


a i Kr, 4) ~ 2 2e( 0,7). (3.8.43) 
n ~ 
Pref. Sineo : 
si” 6) (2) e 2 agen) *uBe (8.8. 3b) 
oul Ne 
equation (6.1.14) ykelds 
a? 2 2 net te 
(ct) = wey pbs Ta? = Eg (Bh, ie (8.8.15) 
WHOPS 
(K), ptt 
B 7 TP, Be ] go Po p 3 Fo (P T)> (8.2.16) 


Let ¢> 0 be given. By a bags extension of Theeren 4.2.5, there 
existe an intecer v> 0 euch that, if k > vu, 
ky <<. | 
[eqtt, BEI - Beltpi1]< 3 (8.8.57) 
Then, for n> vu, 


ris 
Bg (Be I= if Bf fh, ye 














ike) 
A ts : as aw aA te Rev 

<p = |B thm i- & fe, ee nem 

Saf (8.8.48) 
Zor mn sufficientiy large. Thus, 

net aXe} > a ae “% 

asa. Lt a Be F,, Be] Ey Bi, 7] (868249) 

Qn, by (2,6), 
ee ae, +) =%(o,%), (6.8.20) 
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e nm 
Theonen 8.8.3 let £2 (7, 505 ) ve a policy such that 





wl ope 


v3 
B.S Cf) s Wty eceally $ HCH} . 4042 (8.8.22) 
Ther = 
acy )> oer) © ef. (8.8.22) 
Ergof, We Mrst establish by induction that 
iad Co ")> ams, YT). £233,2 (8.8.23) 
mel, pees 
& cf 
For not , -% 
Ge te ee Bt Hae Bet) 
) 
ade x" +) a p++ Poo ‘aneye 2G a (Tp). (8.8.2) 
a « 


Agsume (8.8.23) hoids for n. For i=i, 


s oo, : “ 6 
Bet) = BM) Ces BOCES Hy + BOM) Ce ea, + MCI HD) 
27; e(n) a n) <(n) 2 
ert B Arla +E (cat) =o (aoe) +O (25%). 
(8.8.25) 


Since J * 4a an optimal, poilay for a transition interval of length n, we 
have, for all 0 cf, 


Occ re BMI, + OM oot) = Most) + Marr) 


A 20) 
OFX» by (8.8.24) Gv Leama B.Bel, 
Beatty - oy) 
® 
DOR) = BAYT Ca, + Be, +) - acs 
%Q. (8.3.2?) 
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Betty wre e BAT) Lag + MESH) = Mey] Ber, 
(8.8.28) 
and, since An2A,s 
as ty (oe, ¥) 
>= BRI La, + Oey = rey 
>0, So (8.8.29) 


previrg the inducticn. 
Zquation (8.8.23) and Lemm 8.8.2 together imply that, for all <= eZ, 
ote) = Be 2 arte) 


Moco Nn 


> Ma 2 3% 4) 


n> on 4 
= g(5,'). (8.8.39) 
QeleD. 





CHAPTER 9 
CONCLUDING REMARKS 


in the feregaing chapters we have deseribed a formal structure for 
certain bread classes ef sequential sampling and fixed sample=-also decision 
problems in a Markov chain with unknown transition probehilithes. Sines 
there is very Littio theory in this erea most of our efforts have been 
Grested toward anavering questions of existence and convergence. For 
this reason the portions of this reopert which deal with mmertoal 
commitetion set forth the obvious, but not necessarily the sost efficient, 
ways to approach problems of calculation. It does scem clear, however, 
that, for problems with a large mmber of states in which a fich degres ef 
acouracy is reqiired, we mist think in terme of hours, not mimites, of 
computer time. This ia not te say that the Bayesian method of dealings 
with Markov chains with unsertain transition probabliitios mist bo abendenec 
es impractical. But it met be recoguised that, at the present state of 
the art, the Bayesian treatment is prebably most practical for vroblens 14th 
tw or threo states, loose prior distritations, and large differences in 
the rewards avcoclated with different actions. As problems tend to differ 
frea these ocriterLa, the decislon=ematrer mst belence Inereesing commtation 
time against the remiired accuracy of the solution and chocss an appropriate 
GOVE MA BLOT 

Thore ere numerous questions of Immetleate interest which remain 
ungnawered, some theoretical and come mumerical. Hany of these are listed 
below. 


oe = 
&. Certain ersor bounds were derived 4n Chapters 3, 4, and 5 whieh 
depaw on the discount faster, 8, tut not. on t. the parameter of the prior 
Astritation. These teunids should be made tighter for spooifie prior 
distritations by ineluding factors whioh involve 1’. 


ée The rate of convergenae of the snecessive aporoxinations methods 
developed in Chaptera 3, &, and § depersd upon the ohaics of temainal 
functions. Classes of terminal funetions which accelerate this cenvergence 
rate should be investigated. 


3, Ths analysis of undiscounted adaptive control mdels by letting 
8 =3 i in tho corresponding dliseounted problem may provide a workable 
approseh to a difficult matter. Tho remarks of Section 3.6 are relevant 


fn thie somnection. 


&, The question of the uniqueness of solutions to equations (4.2.49) 
and (4.2.55) is of eonaiderable importance for the celeulation of the 
neane amd product mments of the steadyeatate vootor wim when a method 
ef saseessive approximations is used. The prehlem of the convergence of 
the approximant 7 (n, 1), as defined tyr equation (4.2.42) with the 
terminal fumetien (4.2.52), 40 aloo of importence. 


§. In the tercinal contsol mdels of Chapter 5 1% is necessary to 
evabiate axpresaions of the fora 


Veo ety oe Sac.4} 


S's +) & me Sar.) 4 


wo Shee 


A% uresent the only methed ef finting the maximiging vollay e° is by Gres? 
search over the elerents of ZT. More efficient methods ef farsiine g” shows 
be investigated; approximations to T° of the sort desaribed in Section 5.¥ 
should also be studied. 


& A formal analysis of the undiscounted terminal control models II 
and IV, which were intreduced in Section 5.5, should be carried out. Thisa 
analysis wuld exmmine such questioens as the existence ani uniqueness of 
solutions, the convergence of sucucasive approximation methods, and whethor 
& terminal deckisien point 49 reached with probability one in an optinal 
sampling strategy. im this regard i% 4s to be noted that equations (5.5.2) 
azxi (5.5.3) can be nado more precise ty replacing the ezpressicn 


mae SRC r et It 


by the expression 
mats G(r yt) + ae, r) 


where wu, ( w» +) 4s the expected relative value’ of starting the system in 
state 4 amd operating 1t indefinitely under the policy © when the prior 


nd 


distribution function of £ te ACEI). Methods of computing u(r, 1) 


ae 


have net yet been studied. 


7. There are will-eknwow diffiealtées in assigning a mitivariate 
prier distritution to the elenertes of e 4m such @ manner ag to accurately 
resiect the declsionemaker*’s state of knowledge. It would bo of considcreble 

Pest, therefore, to investigate the sensitivaty of geome of the modecis 
of foregoing chapters to raintively euall changes in the prior dletribuiticn, 








& 
Cf. Howard [22], Che 4. 





ad i Lo 

in atition te these ami cther inmciiate questions wiich ariso in 
comestion with the research reported in thia stidy, there are sevoral 
fairly obvious dlreations in which this research can be extended. For 
example, many of the results and techniques developed hera should be 
capable of extension to decision problemas in a semtelarkov chain in which 
bot) the transition probabilities and the parancters ef the holdins-tine 
distritutions ers uncertain. 

Mere general stachastie pivassses should be amenable to fayegian 
analysie, altheugh different techniaques than those utilised here may be 
required. The Velner procesa, for example, can be analyzed with the 
existins Sayesian thesry for normal processes, 
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APPBNOIK A 
GLOSSARY OF SYMBOLS 


Ueanine 
Beta function. 
Generalised beta funetion. 
Discount factor. 


Sampling cost when systen is 
observed in state 4. 


Expected one-step sampling cost 
when process ie in state 1 and 
elternative k is to be used. 
Meximen sammling cost. 

Covariance operator. 

Brror of nth successive approximant 
in adaptive and terminal control 
problems. 


Srror of the teruinal declislen J. 


Expectation operator. 
Sum of 4th row of transition count. 


Sum of ith column of transition count. 


Humber of trangitions fram state 2 
to state j in n transitions then 
precess starts from state u. 


Expected value of cA giten)- 


Multivariate beta probability 
density function. 


Ronstamiardised mitivarlate beta 
probablilty density function. 


£49 





ae fic 


Defined 
— Peenine OD Face 
8 
fing (F | wy tight) Rete-hittle probability mass 492 
a - function. 
g . (F | ty!) BetaWhittle=2 probability 195 
pve S = 8 mass funntion. 
Pipe | RaM) Nonstandard bete-“hittle-2 196 
mass function. 
Koll 
oe P ‘ P \™) Matrix beta probability density 180 
= funstion. 
ee (P | as MH) Honstandardized matrix beta 196 
probabliity density function. 
it ) (PIE y) Matrix betee4 probabhli ty density 190 
funotion. 
ei 
‘ Oe | woryP) Whittle probability mass function £53 
io (ay | Delp P) Writtle-1 prebsbtlity mase 464% 
oa funstion. 
eho) | patie?) WhAttlee? probability mass 162 
fanction. 
Fo if, I Transition count. 153 
A (5) m) Maltivariate bete probability 17% 
P > distribution function. 
F.(P |) om MP(+) Marginal distribution function 8 
~~ ‘s = of the fi rows of © specified 
ty Le 
FlceBeB* yey’ sey)  — Appelll’s second hypergeometric a 
funation of tao arguments. 
a Family of ter distribution Ny 
Rumetiona, WP lv Ye 
a(P) Expected reward per transition in the 2168 
~ steady state, or process gain, when 
Ly a B. 
ety) or eo, 1%) Expected gein under the policy 9; 169 
I (x) Cama function. am 
HC@\Y) Probabllity distribation funetien for 8 


the generalised stochastio matrix ¢ « 








oy (+ Bn yt) 


‘tiled Cio ++ oe) 


Rt)» 
(gCH) 90 009% (4)) 


Te (n, 1) o 


(if (np T docce ott (mst)> 
Tie) 
<P gl Bovettes? 


Pal Ugh FE ) 


Pe (ny? d 


i 
Heanine, 


Fanily of probebbiity cletribution 
funations, (7 (+). 


Idkeliboes function fer the sample z,° 


oe x , generalised stochastic matrix 
K> WH). 


An N x N oteehastle matrix conalsting 
of 8 rvva of © specified by go. 


Tranapose of the matrix Pe 
Exposted vaine of Py 5° 


n-step transition probabiiity 
WhieGn 2 = Be 


Expected nestep transition probabd lity 
urmier the polioy wT. 


Steady-state nrebakdlity vector 
of an ergodie Markov chain when 
Dm P, 

& 


Expected staady-state probability 
Fantors 


zhe nth suecessive appromimation to 
wa {(¥>e 


Expeeted value of ay 7s 

Set of oi tranaition counts of 
eo n witle: start in state u ar 
end in state v when P is the matrix 
ef twancition pro ties. 


Sat ef all transition counts ef aise 
m whieh start in state u when P 

43 the matrix of tranci den 

provact litios. 


Set ef all transition counts of 
chee n when P 48 the matri:: of 
transition probabilities. 


wal 


G4 


73 


152 


£62 


sign), 
si a?) 


ry 
G 
F goltis?) 


#,,(usven) 


F 5: (ttpn) 


e+ 


«a2 Sho 


Sat of ali transition counts 

of sise n which start am ex in 
the samo state when Pis the 
mateiz of transition” prebailities. 


Set of all transition eaunts of 
eise n which start, and end in 
different states when P is the 
matrix of transition probabild thes. 


Set of ati transition counts of sise 
m wiles start in state u and end in 
state v when the matrix ef transition 
probabilities is positive. 


Set of ali transition counts of 
size n wileh start in state u when the 
matrix of transition probabiititios is 
positive. 


Set of all transitéion eoumts of aize 
n when the matrix of transition 
probabiiities is positive. 


Set of all transition counts of aise 
n whieh start and eri 4n the sume 
etate when the matrix of transition 
probebilities ia positive. 


Se% of all transition counts of aize 
n which start and ond in different 
states wien the matrix of transi Mon 
probabilities 4a positave. 


Ceoneria mpnbol for tho psrancters of 
a probabdiity distribution function. 


A@riasable parameter act. 

True atate of nature. 

Expected one-atep transition reverd 
when tho syaten 45 in state i 

and alternative k is te be used. 
The mwestep transition probebility 


under the poliey © when @ is the 
trus state of nature. 
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192 


195 





iP, & Bet) Oo? 


a 8H) 


Expected discounted reward in 
nm transitions when the systen 
starts in atate 4. 


The K x N matrix of cno-step 
transition rewards. 


Tas 8 x N mateix of one-step tranei tion 


rowuards consisting of the N rows of 
® specified ty x. 


Maximum element of & . 
Minian element of € . 


Tho elezent of & with the 
largest absolute value. 


The clenent of @ with the sxallest 
absolute value. 


ee set of a randem vector with 
the nonstermdardiszed mitivariate 
beta distri bution. 


Range set of a random matrix «with 
the monsterdardiced nateix beta 
aistribution. 


Set of all K x N goreraliced 
stochastie matrices. 


Sot of alld B 2 BS steohastic 
matrices. 


Sot of all N x N positive stochastic 
MAtrEL COS « 


Set of all WN x N stochastic matrices 
with elements in the clese? interval 
[ae toa}. 

Poliay vaotor. 


 preens VALUE oF 
aptaen)e , (39m) 


Set of fi poiley vectors, LT. 


an. Fare 





Defined 

‘ ft) Faranster of the posterior 12 

distritation of ¢ when the 

paraneter of the nrior distributicn 

is * and a transition from state i 

ta state j under altermative & fa 

observed. 
tt? Paramster of the posterior 12 


cdistribatien of when the 
parameter of the prior distrinution 
4s + end o transition fron state 4 
to state j is observed. 


B grote) Parameter of the posterlor 130 
distribution of & when the parancter of 
the prior distribution is 1, the 
system starts in state 1 andi So observed in state j 
after n transitions urmier the polfay CF. 





v, ( Y) Empeeted total discomted reward over 8) 
an infirate poried when the aysten £26 
starts in state 4 and an eptinal 430 

pompling atrategy is followed. 
v, (nm, 1) Tho nth suscesaive anpreximation byS 
é to v(t ) 123 
vaCrs v) Expected total reward over a period 485 
with terminal operation phase of $46 


length v when the system starts fron 
state 1 and an optimal canmpling 
stratecy ic followed. 


%O%2Z) total discsunted reward over 16 
en infimite peried when the systen 
sterte in state 4 with the policy = 
in force and an optimel sampling 
gtratezy is followed (disountod 
process with setewp cost). 


v Minimnus of a set of constant tersinal eh 
reward Senotieons. 

y Machu of a sat of constant terminal 6%, 
resard funotions. 

¥,( y) forminal. reward funetion. bf 


v Round om the terminal reward fanotions. iv 





Vg SE) 


¥,C5 0+) or (4) 


Vy (at) 
var[*]or varie 
Pa = (5, oy p0eeem) 





Renected total discounted reward 
oven, Sit AROSA Ne: esr a ee Ee Ps 
the oyoten starts from etate 

2,5 and a isod miiay is usa. 


Expected total discounted reward 
over an infinite perled when the 
eyaten starts from state 4 ani tho 
polfey gc is used. 


Dany mMiceess4ve approxinetion 
to %,(t). 


Tae varlance operator. 


A eample of n + 1 states cocunied 
by a Harkey chain. 
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42 





READ 


ALFA] 


ALFA 


GAMMA 


APPEXDGX B 


PROGRAH VITERATION TO SOLVE 
EQUATIONS (3.2.1) AND (3.2.2). 


RPROGRAM NAME IS VITERATION 
RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V( Is Tis M) FOR 


RT=leeseN AND Tl=Sseceoslts FOLLOWING AN OPTIMAL POLICY. THE 
RREWARD MATRIX [IS R AND THE TERMINAL REWARD VECTOR IS RHO; 
RTHE MAXIMIZATION IS OVER THE MUC(T) ALTERNATIVES IN STATE {7s 
RBETA IS THE DISCOUNT FACTORe A MATRIX BETA PRIOR iS 
RASSUMED. 


PROGRAM COMMON ReRHOsRDIMs IND sMUeNesBETAs INDI] oLISTSTOP> 
OINDIMsPOL 

INTEGER INDeMUsNoIND1] s TOP sSeTsI eJeK sMAXSP sN1 sPOL V2 
DIMENSION R(t(SO00eRDIM) sM(500esRDIM) sRHO(10) sIND(LOOsINDIM? « 
OMU(10)sIND1 (10) sLIST(21000)9V1(10)sV2(10) 

VECTOR VALUES RDIM=39120:0 

VECTOR VALUES INDIM=29120 

READ FORMAT INPUT1» NoSeTsMAXSP»sBETA 

PRINT FORMAT OUTIA»s BETA 

RDIM(2)=N 

RDIM(3)=N 

INDIM(2)2N 

IND1(1)=0 

IND(151)=0 

THROUGH ALFAl»s FOR K=leleKeEeN 

IND(19K+1)=K*EN 

NI=N#N 

THROUGH ALFAs FOR K=lolsKeEeMAXSP 

TIND1(K+2)2K#N 

IND(K+101)=K#N1 

THROUGH ALFAs FOR T=lolslIeEeN 

IND(K+4+1 5004] )=K#NI+I4N 

READ FORMAT INPUT29 MU(1) eeo MUN) 

PRINT FORMAT OUTIEs (I=LsleTeGeNs MUC(!I)) 

READ FORMAT INPUT3»® R(LTelsl1) eoo RIMUIN) oNoN)d MCLols13 
Oc.es MIMUC(N) sNoN)s RHO(L) eee RHOIN) 

PRINT FORMAT OUTIDs (I=lslsoIleGeNs RHO(T)) 

PRINT FORMAT OUTIBs CI=lelsleGeNs (K=lsloKeGoMU(!) > 
OC JzloletJeGoNeo MCINDCINDI(K)4+I)45)))) 

PRINT FORMAT OUT1Ce CI=leloleGeNs (K=lslsKeGeMU(I1)s 
O(JzloleJeGeNs RIINDC INDI (K)4I )4J4)))) 

SET LIST TO LIST 

LIST = 0 

THROUGH DELTAs FOR K=So9lsKeGel 

THROUGH GAMMAs FOR I=lalolIeGeN 

VICI) =VMAX el IT 9K eM) 

V2¢I)=POL 


PHI 


PRINT FORMAT OUT2» 
DELTA PRINT FORMAT OUT3, 
TRANSFER TO READ 


REFORMAT 


VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 


VECTOR 


END OF PROGRAM 


EXTERNAL FUNCTION (T1> 
ENTRY TO VMAXe 


~Bitle 


Ke V1(1) 
V2¢(1) 


e®oe 


eee V2(N) 


SPECIFICATIONS 


VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 


08G6156e5*S$ 


VALUES 


RMATRIX BETAe 
RALTERNATIVES IN STATE 


INPUT1=$41109F10.5*S 


INPUT2=SlLOI7#$§ 
INPUT3#$(7F10.5)#5 
OUTIA=$7HIBETA =2G61505#S 
OUTIB=S6HO M =2861505/(861505)#$ 
OUT1C=$6HO R =—28G61545/(8G1505)*§ 
OUT1ID=S6HORHO =9861525/861505#$ 


OUTIEsS5HOMU =s10I15*§ 
OUT2=SBHOFOR T =s1I2914H VII» To» 


VitN) 


M) 2=9/7G615.5/ 


OUT3=S7H POLICY s1lOI5#$ 


RMAXIMUM EXPECTED RETURN 
RSTATE 11 WITH PARAMETER MATRIX Me 


Nis M) 


RTHIS FUNCTION RECURSIVELY COMPUTES MAX VUITL»NlsM)2Ys THE 
IN Ni STEPS IF THE SYSTEM STARTS IN 


PRIOR DISTRIBUTION IS 


MAXIMIZATION IS OVER THE MU(T1) 


Il. 


PROGRAM COMMON ReRHOsRDIMs IND sMUsNsBETAs INDI sLISTs TOPs 


OINDIMsPOL 


INTEGER IleNloIsN2eKe IND sMUsNoRDIM oJ e INDI» TOP sPOL 
DIMENSION R(5002RDIM) eRHO(10) sRDIM(3)2IND( 1O0sINDIM) » 


Y=T) 
NZ2=N1 


Ye1leF=35 


WHENEVER N2oE oO» 


THROUGH ALFA» 


MSUM=0¢ 


FOR K=1lslsKeGeMU(T) 


THROUGH PHI» FOR J=l1elsJeGeN 
MSUM=MSUM+M (CIND¢ INDI CK) 4+1)4J5) 


STOR=0. 
THROUGH GAMMA» 


SAVE RETURN 


SAVE DATA N22POL »MSUMesSTOR dY eM(K ot oJ) 


FOR 


JFloleJeGeN 


EXECUTE TRilelIsJ9KaMoTM) 
X=VMAXe(JoN2—-19TM) 


OL TST(21000) »9MU(10)sIND1(10) sINDIM( 2) »TM(500sRDIM) 


FUNCTION RETURN RHO(1) 


ooo MIMUCT) oT oN) ol o.f0% 
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oe ~ = 14" 
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ee "YY 
| : | ‘ 
' = 
U a. ' - 
' 4 
‘ ; 5 : ; 
‘ 
- 
i 
' 





~ Bye 


RESTORE DATA KeJosIToM(MUCT) x IeN) eco MiKeIeJd) eYeaSTORsMSUM,» 
OPOL »N2 
RESTORE RETURN 


GAMMA STOR=STOR+(MCINDCINDI (KD 41245) /MSUM) HIROCINDCINDI(KI+10 04) 


ALFA 


ALFA 


0+BETA*X ) 

WHENEVER STOR olEe Ys TRANSFER TO ALFA 
Y=STOR 

POL=K 

CONTINUE 

FUNCTION RETURN Y 

END OF FUNCTION 


EXTERNAL FUNCTION (CI¥1lsJ19K1sMeTM) 
ENTRY TO TRle 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTRIle(IlsJlsKloM)=TMs WHEN A TRANSITION 1S OBSERVED FROM [i [C 
RJ1 UNDER ALTERNATIVE Kie PRIOR DISTRIBUTION IS MATRIX SETA. 


PROGRAM COMMON ReRHOeRDIMoINDsMUSsN oBETA sINDIsLISTsTOP 9 
OINDIMsPOL 

INTEGER ILoJloKIi sIsJeKesINDesMUsNosIND1 

DIMENSION R500 sRDIM) 9RHO(10) sRDIM(3) 2INDCLOOsINDIM) » 
OL TST(21000 ) »9MU(10)2TND1(10) sINDIM(2) 

THROUGH ALFAs FOR I=lLoeloleGeN 

THROUGH ALFA» FOR J=lel 9JeGeN 

THROUGH ALFAs FOR K=1lel sKeGeMU(T) 

TMCINDCIND? (K)4T)4+4J2.=MCINDCINDI(K}4T7 2453 

TMCINDCINOT (K1)247212)4J51) =MCINDCINO1 (K1)41194312)41.0 
FUNCTION RETURN 

END OF FUNCTION 
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APPENDIX C 


PROGRAM PHI MATRIX TO 
COMPUTE EQUATION (4.1.2) 


RPROGRAM NAME IS PHI MATRIX 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF PHI{(IsJsTloh} 
RFOR YoJeloece2N AND T1=Ssecccerle A MATRIX BETA 

RPRIOR 1S ASSUMED. 


PROGRAM COMMON INDeNeJSeLISTsTOPesMDIM 
INTEGER No INDsIsJeSeT aKa TOP 
DIMENSION M100 eMDIM) os IND(10) sLIST( 21000) eF(10) 
VECTOR VALUES MDIM=22190 
READ READ FORMAT INPUT12s NoSoT 
MDIM(2)2N 
INDB(1)=0 
THROUGH ALFA» FOR K=1loel sKeEoN 
ALFA IND(K41)=K¥N 

READ FORMAT INPUT2»s Ml1l91) eee MINsN) 
PRINT FORMAT OUT1s NsSoTs (K=alolsKeGeNo(L=191lsLeGeN» 
CMCIND(K)4L))) 
SET LIST TO LIST 
LIST=0 
THROUGH GAMMAs FOR K2SsloeKeGol 
THROUGH GAMMAs FOR I=lLoloaleGeNn 
THROUGH DELTA» FOR J=loleJeEoN 

DELTA Ft J)=PHI el I eK aM) 
FI(N)=1.0 
THROUGH EPS»s FOR J=ElslsJeEeN 

EPS FI(NI=FIN)@F (JU) 

WHENEVER TeFel 
PRINT FORMAT OUT29s Ko F(1) eee FIN) 
OTHERWISE 
PRINT FORMAT OUT3s Ftl) eee FIN) 

GAMMA END OF CONDITIONAL 
TRANSFER TO READ 


RFORMAT SPECIFICATIONS 

VECTOR VALUES INPUT1=$3110#$ 

VECTOR VALUES !INPUT2=$(7F1065)#$ 

VECTOR VALUES OUT1=S3HIN=9T594H S=015e4H T=915/ 

O(1H » 861525) *S 

VECTOR VALUES OUT2=$7HOFOR Tas I229 15H PHI(IT»JeToM)=> 
06615e5%*$ 

VECTOR VALUES OUT3=$S24» 6615.5*5 

END OF PROGRAM 
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EXTERNAL FUNCTION (Ils Tls M) 
ENTRY TO PHIe 


RTHIS FUNCTION RECURSIVELY COMPUTES PHI(T1lsJ»TlsM)£Yo THE 
RPROBABILITY THAT AT TIME T1 THE SYSTEM WILL BE IN STATE J» 
RGIVEN THAT AT TIME O IT WAS IN STATE 11 WITH PARAMETER 
RMATRIX Me PRIOR IS MATRIX BETA. 


PROGRAM COMMON INDeNsJeLISTsTOPsMDIM 
INTEGER IlLedJoTLleIoTsKaNx INDsTOPsMDIM 
DIMENSION IN0D(70)sLIST(21000) sMDIM(2)9TM(1009MDIM) 
T=!1 
D2 we | 
MSUM=0.6 
THROUGH ALFAs FOR K=lolsKeGeN 
ALFA MSUM=MSUM4M(IND(T)4+K) 
WHENEVER TeEels FUNCTION RETURN MCIND(I)4J)/MSUM 
Y=0.6 
THROUGH BETAs FOR Ke1l919KeGeN 
SAVE RETURN 
SAVE DATA YoT»MSUMoMl191) eco MINSN) oI ok 
EXECUTE TRelI»KoMsTM) 
X=PHIo(KaT=1l9TM) 
RESTORE DATA KoeIsM(NeN) eoo Mlisl) »MSUMsTsY 
RESTORE RETURN 
BETA YSY+(MCINDCI)+K )/MSUM) 2X 
FUNCTION RETURN Y 
END OF FUNCTION 


EXTERNAL FUNCTION (19K eMoTM) 
ENTRY TO TRe 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTelIsKesM)=TMs WHEN ONE TRANSITION FROM I TO K IS OBSERVED. 
RPRIOR IS MATRIX BETAe 


PROGRAM COMMON INDoNoJeLIST »sTOPsMDIM 
INTEGER IsKsINDosJsLoaoNoJl sMDIMsTOP 
DIMENSTON I1ND(10) sMDIM(2) sLIST(21000) 
THROUGH ALFA» FOR J1=19212J1eG6.N 
THROUGH ALFA» FOR L=loleLeGeN 

ALFA TMCIND(J1)4LI=MCIND(J1)4L) 
TMCIND(CT)4#K)2IMCIND(OT)4+K)4120 
FUNCTION RETURN 
END OF FUNCTION 
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APPERDEZ D 


PROGRAM PIAPROX TO COMPUTE 
EQUATIONS (4.2.42). 


RPROGRAM NAME IS PIAPROXe THIS PROGRAM RECURSIVELY COMPUTES 


RVALUES OF THE SUCCESSIVE APPROXIMANT PI(IsT1l2M) FOR 
RI=lseeceosN AND T1=SvceeexsTe A MATRIX BETA PRIOR IS USED. 


PROGRAM COMMON INDoeNeLIST sMDIMeaNILeADIMsAIND 
INTEGER No INDo TI aK eSo0ToNi sAIND 
DIMENSION M(1O009MDIM) sIND(10) sL IST( 21000) 2 F (10) sAIND(10) 
VECTOR VALUES MDIM=29120 
VECTOR VALUES ADIM=2919:0 
READ READ FORMAT INPUT1»® NoSoT 
MDIM(2)2N 
NIL=N+1 
ADIM(2)=N1 
AIND(1)=0 
IND(1)=20 
THROUGH ALFA» FOR Kel1loel »KeEoN 
AIND(K41)2K#N] 
ALFA IND(K41) 2K#N 

READ FORMAT INPUT2.s M(1l91) eos MINN) 
PRINT FORMAT OUT1l»® NeoSoTsl(KslolsKeGeNs (J=z=lvleoleGoNs 
OMCIND(K) 41) )) 
SET LIST TO LIST 
LIST=0 
THROUGH GAMMAs FOR K=S919KeGelT 
THROUGH DELTA» FOR I2zlelsIeGeN 

DELTA FC(CT)=PIel(IeKoM) 
PRINT FORMAT OUT2. Ko F(l) eeo FIN) 
SUM=0. 
THROUGH BETAs FOR I=loloeleGeN 

BETA SUM=SUM+F(T) 
THROUGH EPS» FOR T=lolotleGeN 

EPS FC I)=#F(1)/SUM 

PRINT FORMAT OUT30 F(1) eso FIN) 

GAMMA PRINT FORMAT OUT4»s SUM 
TRANSFER TO READ 


RFORMAT SPECIFICATIONS. 

VECTOR VALUES INPUT1=$3110*$ 

VECTOR VALUES INPUT22$(7F10.5)#5 

VECTOR VALUES OUTI1=S3HIN=2I594H S=sI5e4H T=eI5/ 

OC1H 9861565)*S 

VECTOR VALUES OUT2=S7HOFOR TesI2210H PI(T9M)=9(6615e5) #5 
VECTOR VALUES OUT32S$19H NORMALIZED VECTOR=9 (6615-5) *¢ 
VECTOR VALUES OUTG&=$1H 2511 27HC(T2M)=9G1506*5 

END OF PROGRAM 
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~ Bib 


EXTERNAL FUNCTION(J1 »T19M) 
ENTRY TO PI. 


RTHITS FUNCTION RECURSIVELY COMPUTES PI(J1lsTlsM)s THE Titi 
RSUCCESSIVE APPROXIMANT TO THE J1TH ELEMENT OF THE MEAN 
RSTEADY=STATE PROBABILITY VECTOR WHEN THE PRIOR IS MATRIX 
RBETA WITH PARAMETER Me 


PROGRAM COMMON INDeNesLISTsMDIMsN1 sADIMsAIND 

INTEGER JloTlslTsJeKeaNoToINDsMDIMeN1] sADIMeAIND 

DIMENSION IND(10)sLIST(21000) sMDIM(2)2TM(CLOOsMDIM) sPBARI IO o 

OADIM(2)eAIND(10) 

J=J1 

Tz=Tl 

THROUGH ALFAs FOR K=1lsl2KeGeN 

MSUM=0. 

THROUGH BETAs FOR IelelsIeGeN 
BETA MSUM=MSUM4+MCIND(K 342) 
ALFA PBAR(KI=M(IND(K)4J3)/MSUM 

Y=0.6 

THROUGH GAMMAs FOR K=191eKeGeN 

SAVE RETURN 

SAVE DATA YoTsPBAR(K) eco PBARIN) sKaJe Mllol) eso MINoN) 

MOCIND(KI+J) SMC IND (KI) 45) 416 

WHENEVER TeGels TRANSFER TO ZETA 

X=PIZRO.(KeM) 

TRANSFER TO ETA 
ZETA X=PIo(KsTe] oM) 

ETA RESTORE DATA M(N2N) eoo Mllsl)sJsKsPBARIN) cece PBAR(K) ol sY 

RESTORE RETURN 
GAMMA Y=Y+X*#PBAR(K) 

FUNCTION RETURN Y 

END OF FUNCTION 


EXTERNAL FUNCTION(L1 9M) 
ENTRY TO PIZROe 


RTHIS FUNCTION COMPUTES THE TERMINAL FUNCTION PI(L1 202M) 
RAS THE LITH ELEMENT OF THE STEADY-STATE PROBABILITY VECTOR 
RCORRESPONDING TO THE MEAN OF THE PRIOR DISTRIBUTION. 
RPRIOR IS MATRIX BETA WITH PARAMETER Me 


PROGRAM COMMON IND »NeLISTeMDIMsN1sADIMsAIND 
INTEGER LlisLoNsTIsJeKeoNlxaINDsMDIMsADIMeAIND 


DIMENSION IND(10) sLIST(21000) sMDIM(2) oADIM(2)2A(1109ADIM} ¢ 
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BETA 
GAMMA 


ALFA 


DELTA 


EPS 


IOTA 
ETA 


LAMBDA 
ZETA 


OAIND(10) 


L=t1 

THROUGH ALFAs FOR T=lselsIeGeN 
MSUM=0,. 

THROUGH BETAs FOR K=lol eK eGeN 
MSUM=MSUM4MCIND(T) 4K) 

THROUGH GAMMAs® FOR K=loleKeEeN 
ACAIND(K)4+I)=2—=MCINO(CI)+K) /MSUM 
ACAIND(NJ 4I)=1, 
ACAIND(TI4N1)=006 
A(AIND(N)4N1 ) 216 

THROUGH DELTAs FOR K=loeleKoEeN 
ACAIND(K)#K )ZFACAINDOK)+K) 416 
SCRAP=A(AIND(K)4L) 
ACAINDCK)I4L IFAC AINDOK) +N) 
ACAIND(K)4N)=SCRAP 
DIAG=A(AIND(1)41) 

THROUGH EPS» FOR J=291leJeGeN 
ACAIND(1)+4+J3)=ACAIND(1)4J5)/DIAG 
THROUGH ZETAs FOR J=2elsJeGeN 
THROUGH ETAs FOR I=JoleleGeN 
SUB=A(AIND( 1) 449) 

THROUGH [fOTAs FOR KeloloKeEeJ 
SUB=SUB—AUAIND(TI+4K ) *ACAIND(K 945) 
ACAIND(1)4J)=SUB 
DIAG=ACAIND(J3)4J5) 

THROUGH ZETA» FOR I=J+lolslIeGoNl 
SUB=A(AIND(JU)+T) 

THROUGH LAMBDA» FOR KeloleKeEeJ 
SUB=SUB-A(AIND(CJI4K) *ACAIND(K) 41) 
ACAIND(J)+I)=SUB/DIAG 

FUNCTION RETURN ACAIND(N)4N1) 
END OF FUNCTION 
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APPENDIX E 


PROGR4AM VASIHP TO COHMFUTE 
EQUATION (4.3.13). 


RPROGRAM NAME IS VASYMPe 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V(ToJeM) FOR 
RI=lscee9No SJz=Ssrecertleo THE REWARD MATRIX Is R AND THE TERN= 
RINAL REWARD VECTOR IS RHO* A MATRIX BETA PRIOR IS ASSUMED. 
RTHE DISCOUNT FACTOR IS BETAe 


PROGRAM COMMON ReRHOSRDIMsINDsNosBETAsLISTsTOP 
INTEGER No INDsJoSoT eK et sTOP 
DIMENSION RCLO0sRDIM) eM(100 sRDOIM) eRHO(10) » IND(10)2V1(10} » 
OV2(10)eLIST( 21000) 
VECTOR VALUES RDIM=2s120 
READ READ FORMAT INPUT1» NoSeTsBETA 
RDIM(2)=N 
IND(1)=s0 
THROUGH ALFA» FOR K=loleKeEoN 
ALFA IND(K+4+1)2K*N 
READ READ FORMAT INPUT2s R191) eee RINON) 9 Mllsol) eee MINN) > 
ORHO(1) eee RHO(N) 
PRINT FORMAT OUTI1As BETA 
PRINT FORMAT OUTIBs (K=L 91 »KeGeNol(LalelsteGoNs MIIND(K)4L} 33 
PRINT FORMAT OUT1C9(K=1891 29KeGoNolL=1 9] si eGeNeoR(IND(KI+4L})3 
PRINT FORMAT OUT1Ds (K=1lsl9KeGeN»s RHO(K)) 
THROUGH PHI» FOR K=lelsKeGeN 
PHI V2(K)=0.6 
SET LIST TO LIST 
LrISTe«0 
THROUGH DELTAs FOR KeSoloKeGel 
THROUGH GAMMAs FOR L=lslelLeGeN 
VI(LI=Vel(LaKoM) 
GAMMA V2(L)=V1i(L)-V2(L) 
PRINT FORMAT OUT29 Ks Vi1) eos VIIN) 
PRINT FORMAT OUT3s V2(1) eee V2(N) 
THROUGH DELTAs FOR L3lol»s LeGoN 
DELTA V2(L)ISV1(L) 
TRANSFER TO READ 


RFORMAT SPECIFICATIONS 

VECTOR VALUES INPUT1=$3110»2 F10.5#§ 

VECTOR VALUES INPUT2=$(7F1025)%S% 

VECTOR VALUES OUTIASS$8HIBETA =2615e5*$ 

VECTOR VALUES OUT1B=S8HO M 298G6156e5/(01H 2861545) *% 
VECTOR VALUES OUT1C#S$8HO R =98G15e5/01H 286155) #5 
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ALFA 


GAMMA 


= 249 


VECTOR VALUES OUT1D=$8HO RHO =9861545/8G615e5*§$ 

VECTOR VALUES OUT2=S8HOFOR T es I2s 14H ViIe Te M) =» 
0761545/8615.5*$ 

VECTOR VALUES OUT3=$S3s21HDELTA V( Is Tels M) 287615.5/ 
O8G15¢5*$ 

END OF PROGRAM 


EXTERNAL FUNCTION (Ils Jl» M) 
ENTRY TO Ve 


RTHTS FUNCTION RECURSIVELY COMPUTES VelITleJleM)sV¥s THE TOTAL 
REXPECTED DISCOUNTED RETURN IN J1 STEPS IF THE SYSTEM STARTS 
RIN STATE I1 WITH PARAMETER MATRIX Me PRIOR IS MATRIX SETA~ 


PROGRAM COMMON ReRHOeRDIMsINDeoNeBETAsLISTsTOP 
INTEGER IlLoJlsIeJoKesINDeNesRDIMsTOP 
DIMENSION R(100eRDIM) sRHO(10) sRDIM(2) ®SIND(10) sLIST(21000) » 
OTMt(1TOOsRDIM) 

l=11 

J=J1 

WHENEVER J eEe O» FUNCTION RETURN RHOCT) 
MSUM=0. 

THROUGH ALFAs FOR K=aleolsKeGeN 
MSUM=MSUM4MCIND(I)4K) 

Y2=0e6 

THROUGH GAMMA»s FOR KelelesKeGeN 

SAVE RETURN 

SAVE DATA JsYaMSUMeM(I9K) eoo MCIaN) oI eK 
EXECUTE TRelIeKsMsTM) 

X=Vel(KaJ—L9TM) 

RESTORE DATA KosTIoMlIoN) eee MIITsK) sMSUMsY oJ 
RESTORE RETURN 

YeY+(MC INDO I)4+K)/MSUM) #C(RCIND(IT)4K)4BETA#X) 
FUNCTION RETURN Y 

END OF FUNCTION 
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EXTERNAL FUNCTION (Io Ks Ms TM) 
ENTRY TO TRe 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTo(IeKsM)=TMs WHEN ONE TRANSITION FROM I TO K IS OBSERVED. 
RPRIOR IS MATRIX BETAs 


PROGRAM COMMON RsRHO®SRDIMsINDoNeBETAsLISTe TOP 

DIMENSION R(1009RDIM) sRHO(10) sRDIM{(2) 9 IND(10) sLIST( 21000} 
INTEGER IsKsINDsJoLoN 

THROUGH ALFA» FOR JIzLoletJeGeN 

THROUGH ALFAs FOR L=lol eotleGeN 

TMC IND(J)4L) SMC IND( J) 4b) 

TMCINDOT)#KY=TMCIND(I 4K 3410 

FUNCTION RETURN 

END OF FUNCTION 
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